
B1．1 Using This Appendix ..... B1－3
B1．2 Syntax ..... B1－4
B1．3 Alphabetical List of ARM and Thumb Instructions ..... B1－8
B1．4 ARM Assembler Quick Reference ..... B1－49
B1．5 GNU Assembler Quick Reference ..... B1－60

This appendix lists the ARM and Thumb instructions available up to，and including，ARM architecture ARMv6，which was just released at the time of writing．We list the operations in alphabetical order for easy reference．Sections B1．5 and B1．4 give quick reference guides to the ARM and GNU assemblers armasm and gas．

We have designed this appendix for practical programming use，both for writing assembly code and for interpreting disassembly output．It is not intended as a definitive architectural ARM reference．In particular，we do not list the exhaustive details of each instruction bitmap encoding and behavior．For this level of detail，see the ARM Architecture Reference Manual，edited by David Seal，published by Addison Wesley．We do give a summary of ARM and Thumb instruction set encodings in Appendix B2．

## B1，1 Using This Appendix

Each appendix entry begins by enumerating the available instructions formats for the given instruction class．For example，the first entry for the instruction class ADD reads

```
1. ADD〈cond>{S} Rd, Rn, 非〈rotated_immed> ARMv1
```

The fields <cond> and <rotated_immed> are two of a number of standard fields described in Section B1.2. Rd and Rn denote ARM registers. The instruction is only executed if the condition 〈cond〉 is passed. Each entry also describes the action of the instruction if it is executed.

The $\{S\}$ denotes that you may apply an optional $S$ suffix to the instruction. Finally, the right-hand column specifies that the instruction is available from the listed ARM architecture version onwards. Table B1.1 shows the entries possible for this column.

## TABLE B1.1 Instruction types.

| Type | Meaning |
| :--- | :--- |
| ARMvX | 32-bit ARM instruction first appearing in ARM architecture version $X$ |
| THUMBvX | 16-bit Thumb instruction first appearing in Thumb architecture version $X$ |
| MACRO | Assembler pseudoinstruction |

Note that there is no direct correlation between the Thumb architecture number and the ARM architecture number. The THUMBv1 architecture is used in ARMv4T processors; the THUMBv2 architecture, in ARMv5T processors; and the THUMBv3 architecture, in ARMv6 processors.

Each instruction definition is followed by a notes section describing restrictions on the use of the instruction. When we make a statement such as " $R d$ must not be $p c$," we mean that the description of the function only applies when this condition holds. If you break the condition, then the instruction may be unpredictable or have predictable effects that we haven't had space to describe here. Well-written programs should not need to break these conditions.

## B1.2 Syntax

We use the following syntax and abbreviations throughout this appendix.

## Optional Expressions

■ $\{<$ expr $r\rangle$ is an optional expression. For example, $\operatorname{LDR}\{B\}$ is shorthand for LDR or LDRB.

■ $\{<\exp 1>|<\exp 2>|. .|<.\exp N>\}$, including at least one "।" divider, is a list of expressions. One of the listed expressions must appear. For example $\operatorname{LDR}\{\mathrm{BI} \mid \mathrm{H}\}$ is shorthand for $\operatorname{LDRB}$ or LDRH. It does not include LDR. We would represent these three possibilities by $\operatorname{LDR}\{I B I H\}$.

## Register Names

■ $R d, R n, R m, R s, R d H i, R d L o$ represent ARM registers in the range $r 0$ to $r 15$ ．
■ $L d, L n, L m, L s$ represent low－numbered ARM registers in the range $r 0$ to $r 7$ ．
－ $\mathrm{Hd}, \mathrm{Hn}, \mathrm{Hm}, \mathrm{Hs}$ represent high－numbered ARM registers in the range $r 8$ to r15．
－ $\mathrm{Cd}, \mathrm{Cn}, \mathrm{Cm}$ represent coprocessor registers in the range c 0 to cl 15 ．
■ $s p, l r, p c$ are names for $r 13, r 14, r 15$ ，respectively．
■ $R n[a]$ denotes bit $a$ of register $R n$ ．Therefore $R n[a]=(R n » a) \& 1$ ．
■ $R n[a: b]$ denotes the $a+1-b$ bit value stored in bits $a$ to $b$ of $R n$ inclusive．
■ RdHi：RdLo represents the 64 －bit value with high 32 RDHi bits and low 32 bits RdLo．

## Values Stored as Immediates

■ 〈immedN〉 is any unsigned $N$－bit immediate．For example，〈immed8〉 represents any integer in the range 0 to 255 ．〈immed5〉＊4 represents any integer in the list $0,4,8, \ldots, 124$ ．

■ 〈address $N$ 〉 is an address or label stored as a relative offset．The address must be in the range $p c-2^{N} \leq$ address $<p c+2^{N}$ ．Here，$p c$ is the address of the instruction plus eight for ARM state，or the address of the instruction plus four for Thumb state．The address must be four－byte aligned if the destination is an ARM instruction or two－byte aligned if the destination is a Thumb instruction．

■ $\langle A-B\rangle$ represents any integer in the range $A$ to $B$ inclusive．
－〈rotated＿immed＞is any 32－bitimmediate that can be represented as an eight－ bit unsigned value rotated right（or left）by an even number of bit positions． In other words，〈rotated＿immed〉＝＜immed8＞ROR（ $2 *$＜immed4〉）．For example $0 \times f f, 0 \times 104,0 \times 00000005$ ，and $0 \times 0 b c 00000$ are possible values for $\langle$ rotated＿immed＞．However， $0 \times 101$ and $0 \times 102$ are not．When you use a rotated immediate，〈shifter＿C〉 is set according to Table B1．3（discussed in Section Shift Operations）．A nonzero rotate may cause a change in the carry flag．For this reason，you can also specify the rotation explicitly，using the assembly syntax 〈immed8〉，2＊＜immed4〉．

## Condition Codes and Flags

＜cond＞represents any of the standard ARM condition codes．Table B1．2 shows the possible values for $\langle$ cond〉．

TABLE B1．2 ARM condition mnemonics．

| ＜cond＞ | Instruction is executed when | cpsr condition |
| :---: | :---: | :---: |
| \｛｜AL\} | ALways | TRUE |
| EQ | EQual（last result zero） | $\mathrm{Z}==1$ |
| NE | Not Equal（last result nonzero） | $\mathrm{Z}==0$ |
| \｛CS｜HS\} | Carry Set，unsigned Higher or Same（following a compare） | $\mathrm{C}==1$ |
| \｛CC｜LO\} | Carry Clear，unsigned LOwer（following a comparison） | $\mathrm{C}==0$ |
| MI | MInus（last result negative） | $N==1$ |
| PL | PLus（last result greater than or equal to zero） | $N==0$ |
| VS | V flag Set（signed overflow on last result） | $V==1$ |
| VC | V flag Clear（no signed overflow on last result） | $V==0$ |
| HI | unsigned HIgher（following a comparison） | $C==1 \& \& Z==0$ |
| LS | unsigned Lower or Same（following a comparison） | $\mathrm{C}==0$｜｜ $\mathrm{Z}==1$ |
| GE | signed Greater than or Equal | $N==$ V |
| LT | signed Less Than | N ！＝V |
| GT | signed Greater Than | $\mathrm{N}==\mathrm{V}$ \＆\＆ $\mathrm{Z}==0$ |
| LE | signed Less than or Equal | $\mathrm{N}!=\mathrm{V}\| \| \mathrm{Z}==1$ |
| NV | NeVer—ARMv1 and ARMv2 only－DO NOT USE | FALSE |

－＜SignedOverflow〉 is a flag indicating that the result of an arithmetic operation suffered from a signed overflow．For example， $0 \times 7 \mathrm{fffffff}+1=$ $0 \times 80000000$ produces a signed overflow because the sum of two positive 32－bit signed integers is a negative 32 －bit signed integer．The $V$ flag in the $c p s r$ typically records signed overflows．

■ 〈UnsignedOverflow〉 is a flag indicating that the result of an arithmetic operation suffered from an unsigned overflow．For example，0xfffffffff +1 $=0$ produces an overflow in unsigned 32－bit arithmetic．The $C$ flag in the $c p s r$ typically records unsigned overflows．

■ 〈NoUnsignedOverflow〉 is the same as $1-\langle$ Unsigned0verflow〉．

■〈Zero＞is a flag indicating that the result of an arithmetic or logical operation is zero．The $Z$ flag in the $c p s r$ typically records the zero condition．

■ 〈Negative〉 is a flag indicating that the result of an arithmetic or logical operation is negative．In other words，〈Negative〉 is bit 31 of the result．The $N$ flag in the cpsr typically records this condition．

## Shift Operations

■ 〈imm＿shift〉 represents a shift by an immediate specified amount．The possible shifts are LSL 非〈0－31＞，LSR 非〈1－32＞，ASR 非〈1－32＞，ROR 非＜1－ $31>$ ，and RRX．See Table B1．3 for the actions of each shift．

■ 〈reg＿shift〉 represents a shift by a register－specified amount．The possible shifts are LSLRs，LSR Rs，ASR Rs，and ROR Rs．Rs must not be pc． The bottom eight bits of $R s$ are used as the shift value $k$ in Table B1．3．Bits $R s[31: 8]$ are ignored．
■ 〈shift＞is shorthand for 〈imm＿shift＞or 〈reg＿shift〉．
■＜shifted＿Rm〉 is shorthand for the value of Rm after the specified shift has been applied．See Table B1．3．
■ 〈shifter＿C〉 is shorthand for the carry value output by the shifting circuit． See Table B1．3．

TABLE B1．3 Barrel shifter circuit outputs for different shift types．

| Shift | k range | ＜shifted＿Rm＞ | ＜shifter＿c＞ |
| :---: | :---: | :---: | :---: |
| LSL k | $k=0$ | Rm | C（from cpsr） |
| LSL k | $1 \leq k \leq 31$ | Rm＂k | Rm［32－k］ |
| LSL k | $k=32$ | 0 | $\mathrm{Rm}[0]$ |
| LSL k | $k \geq 33$ | 0 | 0 |
| LSR k | $k=0$ | Rm | C |
| LSR k | $1 \leq k \leq 31$ | （unsigned）Rm＂k | $\mathrm{Rm}[\mathrm{k}-1]$ |
| LSR k | $k=32$ | 0 | Rm［31］ |
| LSR k | $k \geq 33$ | 0 | 0 |
| ASR k | $k=0$ | Rm | C |


| Shift | $k$ range | <shifted_Rm> | <shifter_C> |
| :---: | :---: | :---: | :---: |
| ASR k | $1 \leq k \leq 31$ | (signed)Rm»k | $\mathrm{Rm}[\mathrm{k}-1]$ |
| ASR k | $k \geq 32$ | - Rm[31] | Rm[31] |
| ROR k | $k=0$ | Rm | C |
| ROR k | $1 \leq k \leq 31$ | ((unsigned)Rm»k)\| $(\mathrm{Rm} »(32-k))$ | $\mathrm{Rm}[\mathrm{k}-1]$ |
| ROR k | $k \geq 32$ | Rm ROR (k \& 31) | $\operatorname{Rm}[(k-1) \& 31]$ |
| RRX |  | $\begin{gathered} (\mathrm{C} « 31) \mid \\ \text { ((unsigned)Rm»1) } \end{gathered}$ | Rm[0] |

## 81,3 <br> Alphabetical List of ARM and Thumb Instructions

Instructions are listed in alphabetical order. However, where signed and unsigned variants of the same operation exist, the main entry is under the signed variant.

ADC Add two 32-bit values and carry


Action

1. $\mathrm{Rd}=\mathrm{Rn}+\langle$ rotated_immed>+C
2. $R d=R n+\langle$ shifted_Rm> $+C$
3. $L d=L d+L m+C$

Effect on the cpsr
Updated if S suffix specified Updated if S suffix specified Updated (see Notes below)

Notes
■ If the operation updates the $c p s r$ and $R d$ is not $p c$, then $N=\langle$ Negative $\rangle$, $Z=\langle$ Zero $\rangle, C=\langle$ UnsignedOverflow $\rangle, V=\langle$ SignedOverflow $\rangle$.

- If $R d$ is $p c$, then the instruction effects a jump to the calculated address. If the operation updates the $c p s r$, then the processor mode must have an $s p s r$, in this case, the $c p s r$ is set to the value of the $s p s r$.
■ If $R n$ or $R m$ is $p c$, then the value used is the address of the instruction plus eight bytes.

Examples

| ADDS | $r 0, r 0, r 2$ | ；first half of a 64 －bit add |
| :--- | :--- | :--- |
| ADC | $r 1, r 1, r 3$ | ；second half of a 64 －bit add |
| ADCS | $r 0, r 0, r 0$ | ；shift r0 left，inserting carry（RLX） |

ADD Add two 32－bit values
1．ADD〈cond＞S Rd，Rn，非〈rotated＿immed＞ARMv1
2．ADD＜cond＞S Rd，Rn，Rm \｛，〈shift＞\} ARMv1
3．ADD Ld，Ln，非〈immed3〉 THUMBv1
4．ADD
Ld，非〈immed8〉
THUMBv1
5．ADD Ld，Ln，Lm
6．ADD Hd，Lm
7．ADD Ld，Hm
8．ADD Hd，Hm
9．ADD Ld，pc，非〈immed8〉＊4
THUMBv1
THUMBv1
THUMBv1
THUMBv1
10．ADD
Ld，sp，非〈immed8〉＊4
Thumbvi
11．ADD
sp，非〈immed7＞＊4
THUMBv1
THUMBv1

Action
1．$R d=R n+$＜rotated＿immed＞
2． $\mathrm{Rd}=\mathrm{Rn}+\langle$ shifted＿Rm＞
3．$L d=L n+\langle i m m e d 3\rangle$
4．Ld $=$ Ld＋＜immed8＞
5．$L d=L n+L m$
6． $\mathrm{Hd}=\mathrm{Hd}+\mathrm{Lm}$
7． $\mathrm{Ld}=\mathrm{Ld}+\mathrm{Hm}$
8． $\mathrm{Hd}=\mathrm{Hd}+\mathrm{Hm}$
9． $\mathrm{Ld}=\mathrm{pc}+4 *\langle$ immed 8$\rangle$
10．$L d=s p+4 *<i m m e d 8>$
11． $\mathrm{sp}=\mathrm{sp}+4 *<$ immed7＞

## Notes

■ If the operation updates the $c p s r$ and $R d$ is not $p c$ ，then $N=\langle$ Negative $\rangle$ ， $Z=\langle$ Zero $\rangle, C=\langle$ UnsignedOverflow $\rangle, V=\langle$ SignedOverflow $\rangle$ ．

■ If $R d$ or $H d$ is $p c$ ，then the instruction effects a jump to the calculated address． If the operation updates the $c p s r$ ，then the processor mode must have an $s p s r$ ； in this case，the cpsr is set to the value of the spsr．
－If $R n$ or $R m$ is $p c$ ，then the value used is the address of the instruction plus eight bytes．
－If $H d$ or $H m$ is $p c$ ，then the value used is the address of the instruction plus four bytes．

## Examples

| ADD | r0，r1，非4 | ；$r 0=r 1+4$ |
| :---: | :---: | :---: |
| ADDS | r0，r2，r2 | ；$r 0=r 2+r 2$ and flags updated |
| ADD | r0，r0，r0，LSL 非1 | ；$r 0=3 * r 0$ |
| ADD | pc，pc，ro，LSL 非2 | ；skip r0＋1 instructions |
| ADD | r0，r1，r2，R0R r3 | ；$r 0=r 1+((r 2 r \gg 3) \mid(r 2 \ll(32-r 3))$ |
| ADDS | pc，1r，非4 | ；jump to $1 r+4$ ，restoring the cpsr |

ADR Address relative
1． $\operatorname{ADR}\{\mathrm{L}\}\langle c o n d\rangle R d,\langle a d d r e s s\rangle$ MACRO
This is not an ARM instruction，but an assembler macro that attempts to set $R d$ to the value＜address＞using a pc－relative calculation．The ADR instruction macro always uses a single ARM（or Thumb）instruction．The long－version ADRL always uses two ARM instructions and so can access a wider range of addresses．If the assembler cannot generate an instruction sequence reaching the address，then it will generate an error．

The following example shows how to call the function pointed to by $r 9$ ．We use ADR to set $l r$ to the return address；in this case，it will assemble to ADD $1 r, p c, ⿰ ⿰ 三 丨 ⿰ 丨 三 一 4$. Recall that $p c$ reads as the address of the current instruction plus eight in this case．

```
        ADR 1r, return_address ; set return address
        MOV rO, 非O ; set a function argument
        BX r9 ; call the function
return_address ; resume
```

AND Logical bitwise AND of two 32－bit values
1．$A N D\langle$ cond $\rangle\{S\} R d, R n$ ，非〈rotated＿immed＞ARMv1
2．AND〈cond〉\｛S\} Rd, Rn, Rm \{, 〈shift〉\} ARMv1
3．AND Ld，Lm THUMBv1
Action Effect on the cpsr
1．$\left.R d=R n \&<r o t a t e d \_i m m e d\right\rangle$ Updated if $S$ suffix specified
2．$R d=R n \&\left\langle s h i f t e d \_R m>\quad U p d a t e d ~ i f ~ S ~ s u f f i x ~ s p e c i f i e d ~\right.$
3．Ld $=$ Ld \＆Lm Updated（see Notes below）
Notes
■ If the operation updates the $c p s r$ and $R d$ is not $p c$ ，then $N=\langle$ Negative $\rangle$ ， $Z=\langle Z$ ero $\rangle, C=\langle$ shifter＿C $\rangle$（see Table B1．3），$V$ is preserved．
■ If $R d$ is $p c$ ，then the instruction effects a jump to the calculated address．If the operation updates the $c p s r$ ，then the processor mode must have an $s p s r$ ，in this case，the $c p s r$ is set to the value of the $s p s r$ ．
－If $R n$ or $R m$ is $p c$ ，then the value used is the address of the instruction plus eight bytes．

Example
AND r0，r0，非0xFF ；extract the lower 8 bits of a byte
ANDS r0，r0，非1＜31 ；extract sign bit

ASR Arithmetic shift right for Thumb（see MOV for the ARM equivalent）

```
    1. ASR Ld, Lm, 非\immed5> THUMBv1
    2. ASR Ld, Ls THUMBv1
THUMBv1
THUMBv1
```

Action
1．Ld＝Lm ASR 非〈immed5＞
2．Ld＝Ld ASR Ls［7：0］

Effect on the cpsr
Updated（see Notes below）
Updated

Note
－The $c p s r$ is updated：$N=\langle$ Negative $\rangle, Z=\langle$ Zero $\rangle, C=\langle$ shifter＿C $\rangle$（see Table B1．3）．

B Branch relative
1．B＜cond＞
＜address25＞ARMv1
2．B＜cond＞
3．B
＜address8＞THUMBv1
＜address11＞THUMBv1
Branches to the given address or label．The address is stored as a relative offset．

Examples

```
B labe1 ; branch unconditionally to a label
BGT loop ; conditionally continue a loop
```

BIC Logical bit clear（AND NOT）of two 32－bit values
1．BIC〈cond＞\｛S\} Rd, Rn, 非〈rotated_immed> ARMv1
2．BIC〈cond＞\｛S\} Rd, Rn, Rm \{, 〈shift>\} ARMv1
3．BIC Ld，Lm THUMBv1

## Action Effect on the cpsr

1． $\mathrm{Rd}=\mathrm{Rn} \& \sim \sim$ rotated＿immed＞
Updated if S suffix specified
2．Rd $=$ Rn \＆～＜shifted＿Rm＞Updated if S suffix specified
3．$L d=L d \& \sim L m \quad U p d a t e d$（see Notes below）

## Notes

■ If the operation updates the $c p s r$ and $R d$ is not $p c$ ，then $N=\langle$ Negative $\rangle, Z=$ $\langle$ Zero $>, C=<$ shifter＿C $>$（see Table B1．3），$V$ is preserved．
－If $R d$ is $p c$ ，then the instruction effects a jump to the calculated address．If the operation updates the $c p s r$ ，then the processor mode must have an $s p s r$ ，in this case，the $c p s r$ is set to the value of the $s p s r$ ．
－If $R n$ or $R m$ is $p c$ ，then the value used is the address of the instruction plus eight bytes．

Examples

$$
\text { BIC r0, ro, 非 } 1<22 \text {; clear bit } 22 \text { of ro }
$$

BKPT Breakpoint instruction
1．BKPT 〈immed16＞ARMv5
2．BKPT 〈immed8〉 THUMBv2
The breakpoint instruction causes a prefetch data abort，unless overridden by debug hardware．The ARM ignores the immediate value．This immediate can be used to hold debug information such as the breakpoint number．

BL Relative branch with link（subroutine call）
1．BL〈cond〉 〈address25＞ARMv1
2．BL＜address22＞THUMBv1
Action
Effect on the cpsr
1． $1 r=r e t+0 ; p c=\langle a d d r e s s 25\rangle \quad$ None
2． $1 r=r e t+1 ; p c=\langle a d d r e s s 22\rangle \quad$ None

## Note

■ These instructions set $l r$ to the address of the following instruction ret plus the current cpsr T－bit setting．Therefore you can return from the subroutine using BX $1 r$ to resume execution address and ARM or Thumb state．

Examples

```
BL subroutine; call subroutine (return with MOV pc,1r)
BLVS overflow ; call subroutine on an overflow
```

BLX Branch with link and exchange（subroutine call with possible state switch）
1．BLX
＜address25＞
ARMv 5

| 2. BLX<cond> | Rm | ARMv5 |
| :---: | :---: | :---: |
| 3. BLX | <address22> | THUMBv2 |
| 4. BLX | Rm | THUMBv2 |
| Action |  | Effect on the cpsr |
| 1. $1 r=r e t+0$ | $=\langle a d d r e s s 25\rangle$ | $\mathrm{T}=1$ (switch to Thumb state) |
| 2. $1 r=r e t+0$ | $=R m$ \& Oxfffffffe | $\mathrm{T}=\mathrm{Rm}$ \& 1 |
| 3. $1 r=r e t+1$ | = <address22> | T=0 (switch to ARM state) |
| 4. $1 r=r e t+1$ | $=\mathrm{Rm}$ \& Oxfffffffe | $\mathrm{T}=\mathrm{Rm}$ \& 1 |

Notes

- These instructions set $l r$ to the address of the following instruction ret plus the current cpsr $T$-bit setting. Therefore you can return from the subroutine using $B X 1 r$ to resume execution address and ARM or Thumb state.
- Rm must not be $p$ c.

■ Rm \& 3 must not be 2. This would cause a branch to an unaligned ARM instruction.

Example

```
BLX thumb_code ; call a Thumb subroutine from ARM state
BLX r0 ; call the subroutine pointed to by ro
    ; ARM code if rO even, Thumb if rO odd
```

BX Branch with exchange (branch with possible state switch)
BXJ

| 1. $B X\langle$ cond〉 | $R m$ | ARMv4T |
| :--- | :--- | :--- |
| 2. $B X$ | $R m$ | THUMBv1 |
| 3. $B X J<c o n d>$ | $R m$ | ARMv5J |

Action

```
1. pc = Rm & 0xfffffffe
2. pc = Rm & 0xfffffffe
3. Depends on JE configuration bit
```

Effect on the cpsr

$$
\begin{aligned}
& \mathrm{T}=\mathrm{Rm} \text { \& } 1 \\
& \mathrm{~T}=\mathrm{Rm} \text { \& } 1 \\
& \mathrm{~J}, \mathrm{~T} \text { affected }
\end{aligned}
$$

Notes

- If $R m$ is $p c$ and the instruction is word aligned, then $R m$ takes the value of the current instruction plus eight in ARM state or plus four in Thumb state.

■ Rm \& 3 must not be 2 . This would cause a branch to an unaligned ARM instruction.
－If the JE（Java Enable）configuration bit is clear，then BXJ behaves as a BX． Otherwise，the behavior is defined by the architecture of the Java Extension hardware．Typically it sets $J=1$ in the $c p s r$ and starts executing Java instructions from a general purpose register designated as the Java program counter $j p c$ ．

## Examples

```
BX 1r ; return from ARM or Thumb subroutine
BX rO ; branch to ARM or Thumb function pointer r0
```

CDP Coprocessor data processing operation

```
1. CDP<cond><copro>, <op1>, Cd, Cn, Cm, <op2> ARMv2
2. CDP2 〈copro>, <op1>, Cd, Cn, Cm, <op2> ARMv5
```

These instructions initiate a coprocessor－dependent operation．$<$ copro $>$ is the number of the coprocessor in the range $p 0$ to $p 15$ ．The core takes an undefined instruction trap if the coprocessor is not present．The coprocessor operation specifiers $<o p 1>$ and $<o p 2\rangle$ ，and the coprocessor register numbers $\mathrm{Cd}, \mathrm{Cn}, \mathrm{Cm}$ ， are interpreted by the coprocessor and ignored by the ARM．CDP2 provides an additional set of coprocessor instructions．

CLZ Count leading zeros

$$
\text { 1. } \mathrm{CLZ}\langle c o n d\rangle \mathrm{Rd} \text {, Rm ARMv5 }
$$

$R n$ is set to the maximum left shift that can be applied to $R m$ without unsigned overflow．Equivalently，this is the number of zeros above the highest one in the binary representation of $R m$ ．If $R m=0$ ，then $R n$ is set to 32 ．The following example normalizes the value in $r 0$ so that bit 31 is set

```
CLZ r1, r0 ; find normalization shift
MOV r0, r0, LSL r1 ; normalize so bit 31 is set (if r0!=0)
```

CMN Compare negative
1．CMN〈cond＞Rn，非〈rotated＿immed＞ARMv1
2．CMN〈cond〉 Rn，Rm \｛，〈shift＞\} ARMv1
3．CMN Ln，Lm THUMBv1
Action
1．cpsr flags set on the result of（Rn＋＜rotated＿immed＞）
2．cpsr flags set on the result of（Rn＋〈shifted＿Rm＞）
3．cpsr flags set on the result of（Ln $+\operatorname{Lm}$ ）
Notes
■ In the cpsr：$N=\langle$ Negative $\rangle, Z=\langle$ Zero $\rangle, C=\langle$ Unsigned－Overflow $\rangle, V=$ $<$ SignedOverflow $>$ ．These are the same flags as generated by CMP with the second operand negated．

■ If $R n$ or $R m$ is $p c$ ，then the value used is the address of the instruction plus eight bytes．

Example

```
CMN r0,非3 ; compare r0 with -3
BLT labe1 ; if (r0 <- 3) goto label
```

CMP Compare two 32－bit integers

| 1．CMP〈cond〉 | Rn，非〈rotated＿immed〉 | ARMv1 |
| :--- | :--- | :--- |
| 2．CMP〈cond〉 | $R n, R m\{,\langle$ shift＞\} | ARMv1 |
| 3．CMP | Ln，非〈immed8〉 | THUMBv1 |
| 4．CMP | $R n$, Rm | THUMBV1 |

Action

| 1．cpsr | flags | set on the result of | （Rn－＜rotated＿immed＞） |
| :--- | :--- | :--- | :--- |
| 2．cpsr flags | set on the result of | （Rn - ＜shifted＿Rm＞） |  |
| 3．cpsr | flags | set on the result of | （Ln $-\langle$ immed8＞） |
| 4．cpsr | flags | set on the result of | （Rn $-R m)$ |

Notes
■ In the cpsr：$N=\langle$ Negative $\rangle, Z=\langle$ Zero $\rangle, C=\langle$ NoUnsigned－Overflow $\rangle$ ， $V=\langle$ SignedOverflow $\rangle$ ．The carry flag is set this way because the subtract $x-y$ is implemented as the add $x+\sim y+1$ ．The carry flag is one if $x+\sim y+1$ overflows． This happens when $x \geq y$（equivalently when $x-\hat{A} y$ doesn＇t overflow）．
－If $R n$ or $R m$ is $p c$ ，then the value used is the address of the instruction plus eight bytes for ARM instructions，or plus four bytes for Thumb instructions．

Example

```
CMP r0, r1, LSR非2 ; compare r0 with (r1/4)
BHS label ; if (r0 >= (r1/4)) goto 1abel;
```

CPS Change processor state；modifies selected bits in the cpsr

| 1．CPS | 非〈mode〉 |
| :--- | :--- |
| 2．CPSID〈flags〉 $\{$ ，非〈mode〉\} | ARMv6 |
| 3．CPSIE〈flags〉 $\{$, 非〈mode〉\} | ARMv6 |
| 4．CPSID〈flags〉 |  |
| 5．CPSIE〈flags〉 |  |

Action
1．cpsr［4：0］＝＜mode＞
2．cpsr＝cpsr｜mask；\｛ cpsr［4：0］＝＜mode＞\}

```
3. cpsr = cpsr & ~mask; { cpsr[4:0]=\langlemode\rangle }
4. cpsr = cpsr | mask
5. cpsr = cpsr & ~mask
```

Bits are set in mask according to letters in the＜flags＞value as in Table B1．4．The ID （interrupt disable）variants mask interrupts by setting cpsr bits．The IE（interrupt enable）variants unmask interrupts by clearing $c p s r$ bits．

## TABLE B1．4 CPS flags characters．

| Character | cpsr bit affected | Bit set in mask |
| :---: | :--- | :---: |
| a | imprecise data Abort mask bit | $0 \times 100=1 \ll 8$ |
| i | IRQ mask bit | $0 \times 080=1 \ll 7$ |
| f | FIQ mask bit | $0 \times 040=1 \ll 6$ |

CPY Copy one ARM register to another without affecting the cpsr．
1．CPY〈cond＞
Rd，Rm
ARMv 6
2．CPY Rd，Rm
THUMBv3

This assembles to MOV＜cond＞Rd，Rm except in the case of Thumb where $R d$ and $R m$ are low registers in the range $r 0$ to $r 7$ ．Then it is a new operation that sets $R d=R m$ without affecting the cpsr．

EOR Logical exclusive OR of two 32－bit values

| 1．EOR〈cond＞\｛S\} | $R d$, | $R n, ~ ⿰ ⿰ 三 丨 ⿰ 丨 三 一$ |
| :--- | :--- | :--- | :--- | rotated＿immed〉 $\quad$ ARMv1

## Action

1．$R d=R n \wedge\langle$ rotated＿immed〉
2．Rd $=R n$＾〈shifted＿Rm＞
3． $\mathrm{Ld}=\mathrm{Ld}$＾ Lm

## Effect on the $c p s r$

Updated if $S$ suffix specified
Updated if $S$ suffix specified Updated（see Notes below）

Notes
If the operation updates the $c p s r$ and $R d$ is not $p c$ ，then $N=<$ Negative $\rangle$ ， $Z=\langle$ Zero $\rangle, C=\langle$ shifter＿C $\rangle$（see Table B1．3），$V$ is preserved．
－If $R d$ is $p c$ ，then the instruction effects a jump to the calculated address．If the operation updates the $c p s r$ ，then the processor mode must have an $s p s r$ ；in this case，the $c p s r$ is set to the value of the $s p s r$ ．
－If $R n$ or $R m$ is $p c$ ，then the value used is the address of the instruction plus eight bytes．

Example

$$
\text { EOR ro, ro, 非 } 1 \ll 16 \quad \text {; toggle bit } 16
$$

LDC Load to coprocessor single or multiple 32－bit values

| LDC＜cond＞ 4 L$\}$ | 〈copro＞，Cd，［Rn $\{$ ，非 -$\}$＜immed8＞＊4\}]\{!\} | ARMv2 |
| :---: | :---: | :---: |
| 2．$L D C<$ cond $>\{L\}$ | 〈copro＞，Cd，［Rn］，非\｛－\}<immed8>*4 | ARMv2 |
| 3．$L D C<c o n d>\{L\}$ | ＜copro＞，Cd，［Rn］，＜option＞ | ARMv2 |
| 4．LDC2 2 L$\}$ | ＜copro＞，Cd，［Rn \｛，非 -$\}$＜immed8＞＊4\}]\{!\} | ARMv5 |
| 5． $\operatorname{LDC} 2\{\mathrm{~L}\}$ | 〈copro＞，Cd，［Rn］，非 -$\}\langle$ immed8＞＊4 | ARMv5 |
| 6． $\operatorname{LDC} 2\{\mathrm{~L}\}$ | ＜copro＞，Cd，［Rn］，＜option＞ | ARMv5 |

These instructions initiate a memory read，transferring data to the given coprocessor． $<$ copro＞is the number of the coprocessor in the range $p 0$ to $p 15$ ．The core takes an undefined instruction trap if the coprocessor is not present．The memory read consists of a sequence of words from sequentially increasing addresses．The initial address is specified by the addressing mode in Table B1．5．The coprocessor controls the number of words transferred，up to a maximum limit of 16 words．The fields $\{L\}$ and $C d$ are interpreted by the coprocessor and ignored by the ARM．Typically Cd specifies the destination coprocessor register for the transfer．The＜option＞field is an eight－bit integer enclosed in $\}$ ．Its interpretation is coprocessor dependent．

TABLE B1．5 LDC addressing modes．

| Addressing format | Address accessed | Value written back to Rn |
| :--- | :--- | :--- |
| $[\mathrm{Rn}\{, \#\{-\}<$ immed $>\}]$ | $\mathrm{Rn}+\{\{-\}<$ immed $>\}$ | Rn preserved |
| $[\mathrm{Rn}\{, \#\{-\}<$ immed $>\}]!$ | $\mathrm{Rn}+\{\{-\}<$ immed $>\}$ | $\mathrm{Rn}+\{\{-\}<$ immed $>\}$ |
| $[\mathrm{Rn}], \#\{-\}<$ immed＞ | Rn | $\mathrm{Rn}+\{-\}<$ immed＞ |
| $[\mathrm{Rn}],<$ option＞ | Rn | Rn preserved |

If the address is not a multiple of four，then the access is unaligned．The restrictions on unaligned accesses are the same as for LDM．

LDM Load multiple 32－bit words from memory to ARM registers
$\begin{array}{lll}\text { 1．LDM〈cond〉＜amode＞} & \text { Rn\｛！\}, <register_1ist>\{^\} } & \text { ARMv1 } \\ \text { 2．LDMIA } & \text { Rn！，} & \text { 〈register＿1ist＞}\end{array}$
These instructions load multiple words from sequential memory addresses．The ＜register＿1ist＞specifies a list of registers to load，enclosed in curly brackets
\{\}. Although the assembler allows you to specify the registers in the list in any order, the order is not stored in the instruction, so it is good practice to write the list in increasing order of register number because this is the usual order of the memory transfer.

The following pseudocode shows the normal action of LDM. We use <register_ 1 ist>[i] to denote the register appearing at position $i$ in the list, starting at 0 for the first register. This assumes that the list is in order of increasing register number.

```
N = the number of registers in <register_list>
start = the lowest address accessed given in Table B1.6
for (i=0; i<N; i++)
    <register_1ist>[i] = memory(start+i*4, 4);
if (! specified) then update Rn according to Table B1.6
```

Note that memory (a, 4) returns the four bytes at address a packed according to the current processor data endianness. If a is not a multiple of four, then the load is unaligned. Because the behavior of an unaligned load depends on the architecture revision, memory system, and system coprocessor (CP15) configuration, it's best to avoid unaligned loads if possible. Assuming that the external memory system does not abort unaligned loads, then the following rules usually apply:

- If the core has a system coprocessor and bit 1 (A-bit) or bit 22 (U-bit) of CP15:c1:c0:0 is set, then unaligned load multiples cause an alignment fault data abort exception.
Otherwise the access ignores the bottom two address bits.
Table B1.6 lists the possible addressing modes specified by <amode>. If you specify the !, then the base address register is updated according to Table B1.6; otherwise it is preserved. Note that the lowest register number is always read from the lowest address.

The first half of the addressing mode mnemonics stands for Increment After, Increment Before, Decrement After, and Decrement Before, respectively. Increment modes load the registers sequentially forward, starting from address $R n$ (increment after) or $R n+4$ (increment before). Decrement modes have the same effect as if you loaded the register list backwards from sequentially

## TABLE B1.6 LDM addressing modes.

| Addressing <br> mode | Lowest address <br> accessed | Highest address <br> accessed | Value written back <br> to Rn if I specified |
| :--- | :--- | :--- | :--- |
| $\{I A \mid F D\}$ | $R n$ | $R n+N * 4-4$ | $R n+N * 4$ |
| $\{I B \mid E D\}$ | $R n+4$ | $R n+N * 4$ | $R n+N * 4$ |
| $\{D A \mid F A\}$ | $R n-N * 4+4$ | $R n$ | $R n-N * 4$ |
| $\{D B \mid E A\}$ | $R n-N * 4$ | $R n-4$ | $R n-N * 4$ |

descending memory addresses, starting from address Rn (decrement after) or $R n-4$ (decrement before).

The second half of the addressing mode mnemonics stands for the stack type you can implement with that address mode: Full Descending, Empty Descending, Full Ascending, and Empty Ascending, With a full stack, $R n$ points to the last stacked value; with an empty stack, $R n$ points to the first unused stack location. ARM stacks are usually full descending. You should use full descending or empty ascending stacks by preference, since LDC also supports these addressing modes.

## Notes

- For Thumb (format 2), $R n$ and the register list registers must be in the range $r 0$ to $r 7$.
- The number of registers $N$ in the list must be nonzero.
- Rn must not be $p c$.
- Rn must not appear in the register list if! (writeback) is specified.
- If $p c$ appears in the register list, then on ARMv5 and above the processor performs a BX to the loaded address. For ARMv4 and below, the processor branches to the loaded address.
- If ${ }^{\wedge}$ is specified, then the operation is modified. The processor must not be in user or system mode. If $p c$ is not in the register list, then the registers appearing in the register list refer to the user mode versions of the registers and writeback must not be specified. If $p c$ is in the register list, then the $s p s r$ is copied to the $c p s r$ in addition to the standard operation.
- The time order of the memory accesses may depend on the implementation. Be careful when using a load multiple to access I/O locations where the access order matters. If the order matters, then check that the memory locations are marked as I/O in the page tables, do not cross page boundaries, and do not use $p c$ in the register list.


## Examples

```
LDMIA r4!, {r0, r1} ; r0=*r4, r1=*(r4+4), r4+=8
LDMDB r4!, {r0, r1} ; r1=*(r4-4), r0=*(r4-8), r4-=8
LDMEQFD sp!, {r0, pc} ; if (result zero) then unstack r0, pc
LDMFD sp, {sp\mp@subsup{}}{}{\wedge} ; load sp_usr from sp_current
LDMFD sp!, {r0-pc}^ ; return from exception, restore cpsr
```

LDR Load a single value from a virtual address in memory

```
1. LDR<cond> \|B}
Rd, [Rn {, 非{-}<immed12>}]{!} ARMv1
2. LDR<cond>{|B}
Rd, [Rn, {-}Rm {,<imm_shift>}]{!} ARMv1
```

| 3．$L$ DR $\langle$ cond $\rangle\{\mid B\}\{T\}$ | Rd，［Rn］，非 - －\}<immed12> | ARMv1 |
| :---: | :---: | :---: |
| 4．LDR $\langle$ cond＞$\langle \| B\}\{T\}$ | Rd，［Rn］，\｛－\}Rm \{,<imm_shift>\} | ARMv1 |
| 5．LDR＜cond＞ H $^{\text {｜}}$ SB $\left.\mid S H\right\}$ | Rd，［Rn，\｛，非\｛－\}<immed8>\}]\{!\} | ARMv4 |
| 6．LDR＜cond＞\｛H｜SB｜SH\} | Rd，［Rn，$\{-\} \mathrm{Rm}]\{!\}$ | ARMv4 |
| 7．LDR＜cond＞ ［H｜SB｜SH\} | Rd，［Rn］，非 -$\}$＜immed8＞ | ARMv4 |
| 8．LDR $\langle$ cond $>\{H\|S B\| S H\}$ | Rd，［Rn］，\｛－\}Rm | ARMv4 |
| 9．LDR＜cond＞D | $\mathrm{Rd},[\mathrm{Rn},\{$, 非－\}<immed8>\}]\{!\} | ARMv5E |
| 10．LDR＜cond＞D | Rd ，［Rn，\｛－\}Rm]\{!\} | ARMv5E |
| 11．LDR＜cond＞D | Rd，［Rn］，非 - －＜immed8＞ | ARMv5E |
| 12．LDR＜cond＞D | Rd，［Rn］，\｛－\}Rm | ARMv5E |
| 13．LDREX＜cond＞ | Rd ，［Rn］ | ARMv6 |
| 14．LDR $\{\|\mathrm{B}\| \mathrm{H}\}$ | Ld，［Ln，非〈immed5＞＊＜size＞］ | THUMBv1 |
| 15． $\operatorname{LDR}\{\|\mathrm{B}\| \mathrm{H}\|\mathrm{SB}\| \mathrm{SH}\}$ | Ld，［Ln，Lm］ | THUMBv1 |
| 16．LDR | Ld，［pc，非〈immed8＞＊4］ | THUMBv1 |
| 17．LDR | Ld，［sp，非＜immed8＞＊4］ | THUMBv1 |
| 18．LDR〈cond＞＜type＞ | Rd，〈label＞ | MACRO |
| 19．LDR＜cond＞ | Rd，＝＜32－bit－value＞ | MACRO |

Formats 1 to 17 load a single data item of the type specified by the opcode suffix， using a preindexed or postindexed addressing mode．Tables B1．7 and B1．8 show the different addressing modes and data types．

## TABLES B1．7 LDR Addressing Modes．

| Addressing format | Address a accessed | Value written back to Rn |
| :--- | :--- | :--- |
| $[\mathrm{Rn}\{, \#\{-\}<$ immed＞$\}]$ | $\mathrm{Rn}+\{\{-\}<$ immed＞$\}$ | Rn preserved |
| $[\mathrm{Rn}\{, \#\{-\}<$ immed＞$\}]!$ | $\mathrm{Rn}+\{\{-\}<$ immed＞$\}$ | $\mathrm{Rn}+\{\{ \}<$＜immed＞$\}$ |
| $[\mathrm{Rn},\{ \} \mathrm{Rm}\{,<$ shift＞\} $]$ | $\mathrm{Rn}+\{-\}<$ shifted＿Rm＞ | Rn preserved |
| $[\mathrm{Rn},\{-\} \mathrm{Rm}\{,<$ shift＞$\}]!$ | $\mathrm{Rn}+\{-\}<$ shifted＿Rm＞ | $\mathrm{Rn}+\{ \}<$ shifted＿Rm＞ |
| $[\mathrm{Rn}], \#\{-\}<$ immed＞ | Rn | $\mathrm{Rn}+\{ \}<$＜immed＞ |
| $[\mathrm{Rn}],\{-\} \mathrm{Rm}\{,<$ shift＞$\}$ | Rn | $\mathrm{Rn}+\{ \}<$ shifted＿Rm＞ |

In Table B1．8 memory（ $\mathrm{a}, \mathrm{n}$ ）reads n sequential bytes from address a ．The bytes are packed according to the configured processor data endianness．The function memoryT（a，n）performs the same access but with user mode privileges，regardless of the current processor mode．The function memory Ex（a，n）used by LDREX performs the access and marks the access as exclusive．If address a has the shared TLB attribute， then this marks address a as exclusive to the current processor and clears any other exclusive addresses for this processor．Otherwise the processor remembers that there is an outstanding exclusive access．Exclusivity only affects the action of the STREX instruction．

TABLES B1．8 LDR datatypes．

| Load | Datatype | ＜size＞（bytes） | Action |
| :---: | :---: | :---: | :---: |
| LDR | word | 4 | $\mathrm{Rd}=$ memory $(\mathrm{a}, 4)$ |
| LDRB | unsigned Byte | 1 | $\mathrm{Rd}=$（zero－extend）memory $(\mathrm{a}, 1)$ |
| LDRBT | Byte Translated | 1 | $\mathrm{Rd}=$（zero－extend）memoryT（a，1） |
| LDRD | Double word | 8 | $\begin{aligned} & \mathrm{Rd}=\text { memory }(\mathrm{a}, 4) \\ & \mathrm{R}(\mathrm{~d}+1)=\operatorname{memory}(\mathrm{a}+4,4) \end{aligned}$ |
| LDREX | word EXclusive | 4 | $\mathrm{Rd}=$ memoryEx（a，4） |
| LDRH | unsigned Halfword | 2 | $\mathrm{Rd}=($ zero－extend）memory $(\mathrm{a}, 2)$ |
| LDRSB | Signed Byte | 1 | $\mathrm{Rd}=($ sign－extend）memory $(\mathrm{a}, 1)$ |
| LDRSH | Signed Halfword | 2 | $\mathrm{Rd}=($ sign－extend）memory $(\mathrm{a}, 2)$ |
| LDRT | word Translated | 4 | $\mathrm{Rd}=\operatorname{memoryT}(\mathrm{a}, 4)$ |

If address a is not a multiple of 〈size〉，then the load is unaligned．Because the behavior of an unaligned load depends on the architecture revision，memory system， and system coprocessor（CP15）configuration，it＇s best to avoid unaligned loads if possible．Assuming that the external memory system does not abort unaligned loads， then the following rules usually apply．In the rules，$A$ is bit 1 of system coprocessor register CP15：c1：c0：0，and $U$ is bit 22 of CP15：c1：c0：0，introduced in ARMv6．If there is no system coprocessor，then $A=U=0$ ．

■ If $\mathrm{A}=1$ ，then unaligned loads cause an alignment fault data abort exception except that word－aligned double－word loads are supported if $U=1$ ．

■ If $A=0$ and $U=1$ ，then unaligned loads are supported for $\operatorname{LDR}\{|T| H \mid S H\}$ ． Word－aligned loads are supported for LDRD．A non－word－aligned LDRD generates an alignment fault data abort．

■ If $A=0$ and $U=0$ ，then LDR and LDRT return the value memory（ $a \& \sim 3,4$ ）ROR $((a \& 3) * 8)$ ．All other unaligned operations are unpredictable but do not generate an alignment fault．
Format 18 generates a $p c$－relative load accessing the address specified by $\langle$ label $>$ ． In other words，it assembles to $L D R\langle c o n d\rangle\langle t y p e\rangle R d,[p c$ ，非〈offset＞］ whenever this instruction is supported and $\langle o f f s e t\rangle=\langle 1$ abel〉－pc is in range．

Format 19 generates an instruction to move the given 32－bit value to the register $R d$ ．Usually the instruction is $\operatorname{LDR}\langle c o n d\rangle R d,[p c$ ，非〈offset＞］，where the 32－bit value is stored in a literal pool at address pc＋＜offset＞．

## Notes

■ For double－word loads（formats 9 to 12 ），$R d$ must be even and in the range $r 0$ to $r 12$ ．

■ If the addressing mode updates $R n$ ，then $R d$ and $R n$ must be distinct．
－If $R d$ is $p c$ ，then $<$ size $>$ must be 4 ．Up to ARMv4，the core branches to the loaded address．For ARMv5 and above，the core performs a BX to the loaded address．

■ If $R n$ is $p c$ ，then the addressing mode must not update $R n$ ．The value used for $R n$ is the address of the instruction plus eight bytes for ARM or four bytes for Thumb．
－Rm must not be $p$ ．
－For ARMv6 use LDREX and STREX to implement semaphores rather than SWP．
Examples

| LDR | $r 0,[r 0]$ | $; r 0=*(i n t *) r 0 ;$ |
| :--- | :--- | :--- |
| LDRSH | $r 0,[r 1], ⿰ ⿰ 三 丨 ⿰ 丨 三$ |  |

LSL Logical shift left for Thumb（see MOV for the ARM equivalent）
1．LSL Ld，Lm，非〈immed5〉
THUMBv1
2．LSL Ld，Ls
THUMBv1
Action

```
1. Ld = Lm LSL 非<immed5> Updated (see Note below)
2. Ld = Ld LSL Ls[7:0] Updated
```

Note
■ The cpsr is updated：$N=\langle$ Negative $\rangle, Z=\langle$ Zero $\rangle, C=\langle$ shifter＿C $\rangle$ （see Table B1．3）．

LSR Logical shift right for Thumb（see MOV for the ARM equivalent）
1．LSR Ld，Lm，非〈immed5〉
THUMBv1
2．LSR Ld，Ls
THUMBv1

Action
1． $\operatorname{Ld}=\operatorname{Lm} \operatorname{LSR}$ 非〈immed5＞
2． $\operatorname{Ld}=\operatorname{Ld} \operatorname{LSR} \operatorname{Ls[7:0]}$

Effect on the $c p s r$
Updated（see Note below） Updated

Note
The $c p s r$ is updated：$N=\langle$ Negative $\rangle, Z=\langle$ Zero $\rangle, C=\langle$ shifter＿C $\rangle$（see Table B1．3）．

MCR Move to coprocessor from an ARM register
MCRR


These instructions transfer the value of ARM register $R d$ to the indicated coprocessor． Formats 3 and 4 also transfer a second register Rn．〈copro〉 is the number of the coprocessor in the range $p 0$ to $p 15$ ．The core takes an undefined instruction trap if the coprocessor is not present．The coprocessor operation specifiers 〈op1〉 and 〈op2〉， and the coprocessor register numbers $\mathrm{Cn}, \mathrm{Cm}$ ，are interpreted by the coprocessor，and ignored by the ARM．$R d$ and $R n$ must not be $p c$ ．Coprocessor $p 15$ controls memory management options．For example，the following code sequence enables alignment fault checking：

```
MRC p15, 0, r0, c1, c0, 0 ; read the MMU register, c1
ORR ro, r0, 非2 ; set the A bit
MCR p15, 0, r0, c1, c0, 0 ; write the MMU register, c1
```

MLA Multiply with accumulate
1．MLA〈cond＞\｛S\} Rd, Rm, Rs, Rn ARMv2

Action
1．$R d=R n+R m * R s$

Effect on the cpsr
Updated if S suffix supplied

Notes
－$R d$ is set to the lower 32 bits of the result．
■ $R d, R m, R s, R n$ must not be $p c$ ．
－$R d$ and $R m$ must be different registers．
－Implementations may terminate early on the value of the Rs operand．For this reason use small or constant values for Rs where possible．See Appendix B3．

■ If the $c p s r$ is updated，then $N=\langle$ Negative $\rangle, Z=\langle$ Zero $\rangle, C$ is unpredictable，and $V$ is preserved．Avoid using the instruction MLAS because implementations often
impose penalty cycles for this operation．Instead use MLA followed by a compare， and schedule the compare to avoid multiply result use interlocks．

MOV Move a 32－bit value into a register

| 1．MOV〈cond〉\｛S\} | Rd， | 非〈rotated＿immed〉 | ARMv1 |
| :--- | :--- | :--- | :--- |
| 2．MOV〈cond〉\｛S\} | Rd，Rm \｛，〈shift＞\} | ARMv1 |  |
| 3．MOV | Ld， | 非〈immed8〉 | THUMBv1 |
| 4．MOV | Ld，Ln | THUMBV1 |  |
| 5．MOV | Hd，Lm | THUMBv1 |  |
| 6．MOV | Ld，Hm | THUMBv1 |  |
| 7．MOV | Hd，Hm | THUMBv1 |  |

## Action

1． $\mathrm{Rd}=$ 〈rotated＿immed＞
2．Rd＝＜shifted＿Rm＞
3．Ld＝〈immed8＞
4． $\mathrm{Ld}=\mathrm{Ln}$
5． $\mathrm{Hd}=\mathrm{Lm}$
6． $\mathrm{Ld}=\mathrm{Hm}$
7． $\mathrm{Hd}=\mathrm{Hm}$

Effect on the cpsr
Updated if $S$ suffix specified
Updated if $S$ suffix specified
Updated（see Notes below）
Updated（see Notes below）
Preserved
Preserved
Preserved

Notes
■ If the operation updates the $c p s r$ and $R d$ is not $p c$ ，then $N=\langle$ Negative $\rangle$ ， $Z=\langle Z$ ero $\rangle, C=\langle$ shifter＿C $\rangle$（see Table B1．3），and $V$ is preserved．
－If $R d$ or $H d$ is $p c$ ，then the instruction effects a jump to the calculated address． If the operation updates the $c p s r$ ，then the processor mode must have an $s p s r$ ；in this case，the cpsr is set to the value of the spsr．

■ If $R m$ is $p c$ ，then the value used is the address of the instruction plus eight bytes．
■ If $H m$ is $p c$ ，then the value used is the address of the instruction plus four bytes．

## Examples

```
MOV ro, 非0x00ff0000 ; r0 = 0x00ff0000
MOV r0, r1,LSL非2 ; r0 = 4*r1
MOV pc, 1r ; return from subroutine ( }pc=1r\mathrm{ )
MOVS pc, 1r ; return from exception (pc=1r, cpsr=spsr)
```

MRC Move to ARM register from a coprocessor MRRC

1．MRC〈cond〉 〈copro〉，〈op1〉，Rd，Cn，Cm ，〈op2〉 ARMv2
2．MRC2 〈copro〉，〈op1〉，Rd，Cn，Cm ，〈op2〉 ARMv5
3．MRRC〈cond〉 〈copro〉，〈op1〉，Rd，Rn，Cm ARMv5E
4．MRRC2 〈copro〉，〈op1〉，Rd，Rn，Cm ARMv6

These instructions transfer a 32－bit value from the indicated coprocessor to the ARM register $R d$ ．Formats 3 and 4 also transfer a second 32 －bit value to $R n$ ．〈copro〉 is the number of the coprocessor in the range $p 0$ to $p 15$ ．The core takes an undefined instruction trap if the coprocessor is not present．The coprocessor operation specifiers ＜op1＞and 〈op2＞，and the coprocessor register numbers Cn，Cm，are interpreted by the coprocessor and ignored by the ARM．For formats 1 and 2 ，if $R d$ is $p c$ ，then the top four bits of the $c p s r$（the NZCV condition code flags）are set from the top four bits of the 32 －bit value transferred；$p c$ is not affected．For other formats，$R d$ and $R n$ must be distinct and not $p c$ ．

Coprocessor p15 controls memory management options．For example，the following instruction reads the main ID register from p15：
MRC p15, 0, r0, c0, c0 ; read the MMU ID register, c0

MRS Move to ARM register from status register（cpsr or spsr）

| 1．MRS〈cond〉Rd，cpsr | ARMv3 |
| :--- | :--- |
| 2．MRS〈cond〉Rd，spsr | ARMv3 |

These instructions set $R d=c p s r$ and $R d=s p s r$ ，respectively．$R d$ must not be $p c$ ．
MSR Move to status register（ $c p s r$ or $s p s r$ ）from an ARM register
1．MSR〈cond〉 cpsr＿＜fields〉，非〈rotated＿immed〉 ARMv3
2．MSR〈cond〉 cpsr＿＜fields〉，Rm ARMv3
3．MSR〈cond〉 spsr＿〈fields〉，非〈rotated＿immed〉 ARMv3
4．MSR〈cond〉 spsr＿〈fields〉，Rm ARMv3
Action

```
1. cpsr = (cpsr & ~<mask>) | (<rotated_immed> & <mask>)
2. cpsr = (cpsr & ~\langlemask>) | (Rm & <mask>)
3. spsr = (spsr & ~\langlemask\rangle) | (\langlerotated_immed\rangle & <mask\rangle)
4. spsr = (spsr & ~\langlemask>) | (Rm & <mask>)
```

These instructions alter selected bytes of the $c p s r$ or $s p s r$ according to the value of $<$ mask $>$ ．The $<$ fields $>$ specifier is a sequence of one or more letters，determining which bytes of＜mask＞are set．See Table B1．9．

TABLE B1．9 Format of the＜fields＞specifier．

| ＜fields＞letter | Meaning | Bits set in＜mask $>$ |
| :---: | :--- | :---: |
| c | Control byte | 0x000000ff |
| x | eXtension byte | 0x0000ff00 |
| s | Status byte | 0x00ff0000 |
| f | Flags byte | 0xff000000 |

Some old ARM toolkits allowed $c p s r$ or $c p s r_{-}$all in place of $c p s r_{-} f s x c$ ．They also used $c p s r_{-} f l g$ and $c p s r_{-} c t l$ in place of $c p s r_{-} f$ and $c p s r_{-} c$ ，respectively．These formats，and the $s p s r$ equivalents，are obsolete，so you should not use them．The following example changes to system mode and enables IRQ，which is useful in a reentrant interrupt handler：

```
MRS r0, cpsr ; read cpsr state
BIC r0, r0, 非0x9f ; clear IRQ disable and mode bits
ORR r0, r0, 非0x1f ; set system mode
MSR cpsr_c,r0 ; update control byte of the cpsr
```

MUL Multiply
1．MUL＜cond＞\｛S\} Rd, Rm, Rs ARMv2
2．MUL
Ld，Lm
THUMBv1

Action

```
1. Rd = Rm*Rs
2. Ld = Lm*Ld
```

Effect on the cpsr
Updated if S suffix supplied
Updated

Notes
■ $R d$ or $L d$ is set to the lower 32 bits of the result．
－$R d, R m, R s$ must not be $p c$ ．
■ $R d$ and $R m$ must be different registers．Similarly $L d$ and $L m$ must be different．
－Implementations may terminate early on the value of the Rs or $L d$ operand．For this reason use small or constant values for Rs or $L d$ where possible．

■ If the cpsr is updated，then $N=\langle$ Negative $\rangle, Z=\langle$ Zero $\rangle, C$ is unpredictable，and $V$ is preserved．Avoid using the instruction MULS because implementations often impose penalty cycles for this operation．Instead use MUL followed by a compare，and schedule the compare，to avoid multiply result use interlocks．

MVN Move the logical not of a 32－bit value into a register
1．$M V N\langle$ cond $\rangle\{S\}$ Rd，非〈rotated＿immed〉 ARMv1
2．$M V N\langle c o n d\rangle\{S\} \operatorname{Rd}, \operatorname{Rm}\{,\langle s h i f t\rangle\}$ ARMv1
3．MVN Ld，Lm THUMBv1

Action Effect on the cpsr
1． $\mathrm{Rd}=\sim\langle$ rotated＿immed $\rangle$
2．Rd $=\sim\langle$ shifted＿Rm＞Updated if $S$ suffix specified
3． $\mathrm{Ld}=\sim \mathrm{Lm} \quad$ Updated（see Notes below）

Notes
■ If the operation updates the $c p s r$ and $R d$ is not $p c$ ，then $N=\langle$ Negative $\rangle$ ， $Z=\langle Z$ ero $\rangle, C=\langle$ shifter＿C $\rangle$（see Table B1．3），and $V$ is preserved．
－If $R d$ is $p c$ ，then the instruction effects a jump to the calculated address．If the operation updates the $c p s r$ ，then the processor mode must have an $s p s r$ ；in this case，the $c p s r$ is set to the value of the $s p s r$ ．

■ If $R m$ is $p c$ ，then the value used is the address of the instruction plus eight bytes．
Examples

```
MVN r0, 非Oxff ; r0 = 0xffffff00
MVN rO, 非O ; r0 = -1
```

NEG Negate value in Thumb（use RSB to negate in ARM state）
1．NEG Ld，Lm THUMBv1
Action Effect on the $c p s r$
1．Ld＝－Lm Updated（see Notes below）
Notes
－The cpsr is updated：$N=\langle$ Negative $\rangle, Z=\langle$ Zero $\rangle, C=\langle$ NoUnsignedOverflow $\rangle$ ， $V=\langle$ SignedOverflow $\rangle$ ．Note that $Z=C$ and $V=(L d==0 \times 80000000)$ ．
－This is the same as the operation RSBS Ld，Lm，非 0 in ARM state．
NOP No operation
1．NOP MACRO
This is not an ARM instruction．It is an assembly macro that produces an instruction having no effect other than advancing the $p c$ as normal．In ARM state it assembles to MOV r0，r0．In Thumb state it assembles to MOV r8，r8．The operation is not guaranteed to take one processor cycle．In particular，if you use NOP after a load of $r 0$ ， then the operation may cause pipeline interlocks．

ORR Logical bitwise OR of two 32－bit values

| 1．$O R R\langle$ cond $\rangle\{S\}$ | $R d$, | $R n$, 非〈rotated＿immed＞ | ARMv1 |
| :--- | :--- | :--- | :--- |
| 2．$O R R\langle c o n d\rangle\{S\}$ | $R d$, | $R n, R m\{,\langle s h i f t\rangle\}$ | ARMv1 |
| 3．$O R R$ | $L d, L m$ |  | THUMBv1 |

Action

```
1. Rd = Rn | <rotated_immed>
```

2. Rd $=$ Rn | 〈shifted_Rm> Updated if $S$ suffix specified
3. Ld = Ld | Lm Updated (see Notes below)

Effect on the cpsr

```
Updated if S suffix specified
```

Notes
■ If the operation updates the $c p s r$ and $R d$ is not $p c$ ，then $N=\langle$ Negative $\rangle$ ， $Z=\langle Z$ ero $\rangle, C=\langle$ shifter＿C $\rangle$（see Table B1．3），and $V$ is preserved．
－If $R d$ is $p c$ ，then the instruction effects a jump to the calculated address．If the operation updates the $c p s r$ ，then the processor mode must have an $s p s r$ ，in this case，the $c p s r$ is set to the value of the spsr．
－If $R n$ or $R m$ is $p c$ ，then the value used is the address of the instruction plus eight bytes．

## Example

```
ORR r0, r0,非1 <<1 ; set bit 13 of r0
```

PKH Pack 16－bit halfwords into a 32－bit word

```
1. PKHBT<cond> Rd, Rn, Rm {, LSL 非<0-31>} ARMv6
```

2. PKHTB〈cond〉Rd, Rn, Rm \{, ASR 非〈1-32>\} ARMv6

Action
1． $\operatorname{Rd}[15: 00]=\operatorname{Rn}[15: 00] ; \operatorname{Rd}[31: 16]=\langle$ shifted＿Rm＞［31：16］
2． $\operatorname{Rd}[31: 16]=\operatorname{Rn}[31: 16] ; \operatorname{Rd}[15: 00]=\langle$ shifted＿Rm＞［15：00］
Note
■ Rd，Rn，Rm must not be pc．cpsr is not affected．

## Examples

```
PKHBT r0, r1, r2, LSL非16 ; r0 = (r2[15:00]<<16)|r1[15:00]
PKHTB r0, r2, r1, ASR非16 ; r0 = (r2[31:15]<<16)|r1[31:15]
```

PLD Preload hint instruction

```
1. PLD [Rn {, 非{-}<immed12>}] ARMv5E
2. PLD [Rn, {-}Rm {,<imm_shift>}] ARMv5E
```

Action

```
1. Preloads from address (Rn + {{-}<immed12>})
2. Preloads from address (Rn + {-}<shifted_Rm>)
```

This instruction does not affect the processor registers（other than advancing $p c$ ）．It merely hints that the programmer is likely to read from the given address in future．A cached processor may take this as a hint to load the cache line containing the address into the cache．The instruction should not generate a data abort or any other memory
system error．If $R n$ is $p c$ ，then the value used for $R n$ is the address of the instruction plus eight．$R m$ must not be $p c$ ．

Examples

```
PLD [r0, 非] ; Preload from r0+7
PLD [r0, r1, LSL非] ; Preload from r0+4*r1
```

POP Pops multiple registers from the stack in Thumb state（for ARM state use LDM）
1．POP 〈regster＿1ist＞THUMBv1
Action
1．equivalent to the ARM instruction LDMFD sp！，〈register＿list＞
The＜register＿1 ist＞can contain registers in the range $r 0$ to $r 7$ and $p c$ ．The following example restores the low－numbered ARM registers and returns from a subroutine：

POP $\{r 0-r 7, p c\}$

PUSH Pushes multiple registers to the stack in Thumb state（for ARM state use STM）
1．PUSH＜regster＿1ist＞THUMBv1
Action
1．equivalent to the ARM instruction STMFD sp！，〈register＿list＞
The＜register＿1ist＞can contain registers in the range $r 0$ to $r 7$ and $l$ ．The following example saves the low－numbered ARM registers and link register．

$$
\text { PUSH }\{r 0-r 7,1 r\}
$$

| QADD | Saturated signed and unsigned arithmetic |  |  |
| :---: | :---: | :---: | :---: |
| QDSUB | 1． QADD ＜cond＞ | Rd，Rm，Rn | ARMv5E |
|  | 2．QDADD＜cond＞ | Rd，Rm，Rn | ARMv5E |
|  | 3．QSUB＜cond＞ | Rd，Rm，Rn | ARMv5E |
|  | 4．QDSUB＜cond＞ | Rd，Rm，Rn | ARMv5E |
|  | 5．\｛U\}QADD16〈cond> | Rd，Rn，Rm | ARMv 6 |
|  | 6．\｛U\}QADDSUBX<cond> | Rd，Rn，Rm | ARMv 6 |
|  | 7．\｛U\}QSUBADDX<cond> | Rd，Rn，Rm | ARMv 6 |
|  | 8．\｛U\}QSUB16〈cond> | Rd，Rn，Rm | ARMv 6 |
|  | 9．\｛U\}QADD8<cond> | Rd，Rn，Rm | ARMv 6 |
|  | 10．\｛U\}QSUB8<cond> | Rd，Rn，Rm | ARMv 6 |

## Action

$$
\begin{aligned}
& \text { 1. } R d=\operatorname{sat} 32(R m+R n) \\
& \text { 2. } R d=\operatorname{sat} 32(R m+s a t 32(2 * R n)) \\
& \text { 3. } R d=\operatorname{sat} 32(R m-R n) \\
& \text { 4. } R d=\operatorname{sat} 32(R m-\operatorname{sat} 32(2 * R n)) \\
& \text { 5. } \operatorname{Rd}[31: 16]=\operatorname{sat} 16(\operatorname{Rn}[31: 16]+\operatorname{Rm}[31: 16]) \text {; } \\
& \operatorname{Rd}[15: 00]=\operatorname{sat16}(\operatorname{Rn}[15: 00]+\operatorname{Rm}[15: 00]) \\
& \text { 6. } \operatorname{Rd}[31: 16]=\operatorname{sat} 16(\operatorname{Rn}[31: 16]+\operatorname{Rm}[15: 00]) \text {; } \\
& \operatorname{Rd}[15: 00]=\operatorname{sat16}(\operatorname{Rn}[15: 00]-\operatorname{Rm}[31: 16]) \\
& \text { 7. } \operatorname{Rd}[31: 16]=\operatorname{sat} 16(\operatorname{Rn}[31: 16]-\operatorname{Rm}[15: 00]) \text {; } \\
& \operatorname{Rd}[15: 00]=\operatorname{sat16}(\operatorname{Rn}[15: 00]+\operatorname{Rm}[31: 16]) \\
& \text { 8. } \operatorname{Rd}[31: 16]=\operatorname{sat} 16(\operatorname{Rn}[31: 16]-\operatorname{Rm}[31: 16]) \text {; } \\
& \operatorname{Rd}[15: 00]=\operatorname{sat16}(\operatorname{Rn}[15: 00]-\operatorname{Rm}[15: 00]) \\
& \text { 9. } \operatorname{Rd}[31: 24]=\operatorname{sat} 8(\operatorname{Rn}[31: 24]+\operatorname{Rm}[31: 24]) \text {; } \\
& \operatorname{Rd}[23: 16]=\operatorname{sat} 8(\operatorname{Rn}[23: 16]+\operatorname{Rm}[23: 16]) \text {; } \\
& \operatorname{Rd}[15: 08]=\operatorname{sat} 8(\operatorname{Rn}[15: 08]+\operatorname{Rm}[15: 08]) \text {; } \\
& \operatorname{Rd}[07: 00]=\operatorname{sat} 8(\operatorname{Rn}[07: 00]+\operatorname{Rm}[07: 00]) \\
& \text { 10. } \operatorname{Rd}[31: 24]=\operatorname{sat} 8(\operatorname{Rn}[31: 24]-\operatorname{Rm}[31: 24]) \text {; } \\
& \operatorname{Rd}[23: 16]=\operatorname{sat} 8(\operatorname{Rn}[23: 16]-\operatorname{Rm}[23: 16]) \text {; } \\
& \operatorname{Rd}[15: 08]=\operatorname{sat} 8(\operatorname{Rn}[15: 08]-\operatorname{Rm}[15: 08]) \text {; } \\
& \operatorname{Rd}[07: 00]=\operatorname{sat} 8(\operatorname{Rn}[07: 00]-\operatorname{Rm}[07: 00])
\end{aligned}
$$

Notes
－The operations are signed unless the $U$ prefix is present．For signed operations， $\operatorname{satN}(x)$ saturates $x$ to the range $-2^{N-1} \leq x<2^{N-1}$ ．For unsigned operations， $\operatorname{satN}(x)$ saturates $x$ to the range $0 \leq x<2 N$ ．

■ The cpsr Q－flag is set if saturation occurred；otherwise it is preserved．
－$R d, R n, R m$ must not be $p c$ ．
－The $X$ operations are useful for packed complex numbers．The following examples assume bits［15：00］hold the real part and［31：16］the imaginary part．

Examples

| QDADD | $r 0, r 0, r 2$ | ；add Q30 value $r 2$ to 031 accumulator r0 |
| :--- | :--- | :--- |
| QADD16 r0，r1，r2 | SIMD saturating add |  |
| QADDSUBX | r0，r1，r2 | r0 $=r 1+i * r 2$ in packed complex arithmetic |
| QSUBADDX | r0，r1，r2 | $; r 0=r 1-i * r 2$ in packed complex arithmetic |

REV Reverse bytes within a word or halfword．

| 1．REV〈cond〉 | $\mathrm{Rd}, \mathrm{Rm}$ | ARMv6／THUMBv3 |
| :--- | :--- | :--- |
| 2．REV16〈cond〉 | $\mathrm{Rd}, \mathrm{Rm}$ | ARMv6／THUMBv3 |
| 3．REVSH〈cond〉 | $\mathrm{Rd}, \mathrm{Rm}$ | ARMv6／THUMBv3 |

Action

```
1. Rd[31:24] = Rm[07:00]; Rd[23:16] = Rm[15:08];
    Rd[15:08] = Rm[23:16]; Rd[07:00] = Rm[31:24]
2. Rd[31:24] = Rm[23:16]; Rd[23:16] = Rm[31:24];
    Rd[15:08] = Rm[07:00]; Rd[07:00] = Rm[15:08]
3. Rd[31:08] = sign-extend(Rm[07:00]); Rd[07:00] = Rm[15:08]
```

Notes
－$R d$ and $R m$ must not be $p c$ ．
■ For Thumb，$R d, R m$ must be in the range $r 0$ to $r 7$ and＜cond＞cannot be specified．
－These instructions are useful to convert big－endian data to little－endian and vice versa．

## Examples

```
REV rO, r0 ; switch endianness of a word
REV16 r0, r0 ; switch endianness of two packed halfwords
REVSH r0, r0 ; switch endianness of a signed halfword
```

RFE Return from exception

```
1. RFE<amode> Rn! ARMv6
```

This performs the operation that $\mathrm{LDM}\langle a \mathrm{mode}$ ） $\mathrm{Rn}\{!\},\{p c, c p s r\}$ would perform if LDM allowed a register list of $\{p c, c p s r\}$ ．See the entry for LDM．

ROR Rotate right for Thumb（see MOV for the ARM equivalent）
1．ROR Ld，Ls
THUMBv 1
Action Effect on the cpsr
1．Ld $=\operatorname{Ld}$ ROR Ls［7：0］Updated
Notes
■ The cpsr is updated：$N=\langle$ Negative $\rangle, Z=\langle$ Zero $\rangle, C=\langle$ shifter＿C $\rangle$（see Table B1．3）．

RSB Reverse subtract of two 32－bit integers

```
1. RSB<cond>{S} Rd, Rn, 非<rotated_immed> ARMv1
```

2．RSB＜cond＞\｛S\} Rd, Rn, Rm \{, 〈shift>\} ARMv1

Action
1．Rd＝〈rotwated＿immed〉－Rn
2．Rd＝〈shifted＿Rm＞－Rn

Effect on the $c p s r$
Updated if S suffix present
Updated if S suffix present

Notes
■ If the operation updates the $c p s r$ and $R d$ is not $p c$ ，then $N=\langle$ Negative $\rangle, Z=\langle$ Zero $\rangle$ ， $C=\langle$ NoUnsignedOverflow $\rangle$ ，and $V=\langle$ SignedOverflow $\rangle$ ．The carry flag is set this way because the subtract $x-y$ is implemented as the add $x+\sim y+1$ ．The carry flag is one if $x+\sim y+1$ overflows．This happens when $x \geq y$ ，when $x-y$ doesn＇t overflow．
■ If $R d$ is $p c$ ，then the instruction effects a jump to the calculated address．If the operation updates the $c p s r$ ，then the processor mode must have an $s p s r$ in this case，the $c p s r$ is set to the value of the $s p s r$ ．
－If $R n$ or $R m$ is $p c$ ，then the value used is the address of the instruction plus eight bytes． Examples

| RSB | ro，ro，非0 | $r 0=-r 0$ |
| :---: | :---: | :---: |
| RSB | r0，r1，r1，LSL非3 | $r 0=7 * r 1$ |

RSC Reverse subtract with carry of two 32－bit integers
1．RSC〈cond〉\｛S\} Rd, Rn, 非〈rotated_immed〉 ARMv1
2．RSC〈cond＞\｛S\} Rd, Rn, Rm \{, 〈shift>\} ARMv1

Action
1． $\mathrm{Rd}=\langle$ rotated＿immed＞－ $\mathrm{Rn}-(\sim \mathrm{C})$
2．Rd＝＜shifted＿Rm＞－Rn－（ $\sim$ C）Updated if S suffix present

Notes
■ If the operation updates the $c p s r$ and $R d$ is not $p c$ ，then $N=\langle$ Negative $\rangle, Z=$ $<$ Zero $\rangle, C=<$ NoUnsignedOverflow $\rangle, V=\langle$ SignedOverflow $\rangle$ ．The carry flag is set this way because the subtract $x-y-\sim C$ is implemented as the add $x+$ $\sim y+C$ ．The carry flag is one if $x+\sim y+C$ overflows．This happens when $x-y-\sim C$ doesn＇t overflow．
－If $R d$ is $p c$ ，then the instruction effects a jump to the calculated address．If the operation updates the $c p s r$ ，then the processor mode must have an $s p s r$ ；in this case the $c p s r$ is set to the value of the spsr．

If $R n$ or $R m$ is $p c$ ，then the value used is the address of the instruction plus eight bytes．
The following example negates a 64－bit integer where $r 0$ is the low 32 bits and $r 1$ the high 32 bits．

| RSBS | $r 0, r 0$, 非0 | $; r 0=-r 0 \quad$ C＝NOT（borrow） |
| :--- | :--- | :--- |
| RSC | $r 1, r 1$, 非0 | $; r 1=-r 1-$ borrow |

SADD Parallel modulo add and subtract operations
1．\｛S｜U\}ADD16<cond>
Rd，Rn，Rm
ARMv 6
2．$\{S \mid U\} A D D S U B X<c o n d>R d, R n, R m$ ARMv6

3．$\{S \mid U\} S U B A D D X<c o n d>$
4．$\{$ S｜U\}SUB16〈cond〉 5．\｛S｜U\}ADD8<cond> 6．$\{\mathrm{S} \mid \mathrm{U}\}$ SUB8＜cond＞

Rd，Rn，Rm ARMv6
Rd，Rn，Rm ARMv6
Rd，Rn，Rm ARMv6
Rd，Rn，Rm ARMv6

Action

Effect on the cpsr

```
1. Rd[31:16]=Rn[31:16]+Rm[31:16]; GE3=GE2=cmn(Rn[31:16],Rm[31:16])
    Rd[15:00]=Rn[15:00]+Rm[15:00] GE1=GE0=cmn(Rn[15:00],Rm[15:00])
2. Rd[31:16]=Rn[31:16]+Rm[15:00]; GE3=GE2=cmn(Rn[31:16],Rm[15:00])
    Rd[15:00]=Rn[15:00]-Rm[31:16] GE1=GE0=(Rn[15:00] >= Rm[31:16])
3. Rd[31:16]=Rn[31:16]-Rm[15:00]; GE3=GE2=(Rn[31:16] >= Rm[15:00])
    Rd[15:00]=Rn[15:00]+Rm[31:16] GE1=GE0=cmn(Rn[15:00],Rm[31:16])
4. Rd[31:16]=Rn[31:16]-Rm[31:16]; GE3=GE2=(Rn[31:16] >= Rm[31:16])
    Rd[15:00]=Rn[15:00]-Rm[15:00] GE1=GE0=(Rn[15:00] >= Rm[15:00])
5. Rd[31:24]=Rn[31:24]+Rm[31:24];GE3 = cmn(Rn[31:24],Rm[31:24])
    Rd[23:16]=Rn[23:16]+Rm[23:16]; GE2 = cmn(Rn[23:16],Rm[23:16])
    Rd[15:08]=Rn[15:08]+Rm[15:08]; GE1 = cmn(Rn[15:08],Rm[15:08])
    Rd[07:00]=Rn[07:00]+Rm[07:00] GE0 = cmn(Rn[07:00],Rm[07:00])
6. Rd[31:24]=Rn[31:24]-Rm[31:24]; GE3 = (Rn[31:24] >= Rm[31:24])
    Rd[23:16]=Rn[23:16]-Rm[23:16]; GE2 = (Rn[23:16] >= Rm[23:16])
    Rd[15:08]=Rn[15:08]-Rm[15:08]; GE1 = (Rn[15:08] >= Rm[15:08])
    Rd[07:00]=Rn[07:00]-Rm[07:00] GE0 = (Rn[07:00] >= Rm[07:00])
```

Notes
－If you specify the $S$ prefix，then all comparisons are signed．The $\operatorname{cmn}(x, y)$ function returns $x \geq-y$ or equivalently $x+y \geq 0$ ．

■ If you specify the $U$ prefix，then all comparisons are unsigned．The $\operatorname{cmn}(x, y)$ function returns $x \geq$（unasigned）（ $-y$ ）or equivalently if the $x+y$ operation produces a carry．
－$R d, R n$ ，and $R m$ must not be $p c$ ．
－The $X$ operations are useful for packed complex numbers．The following examples assume bits［15：00］hold the real part and［31：16］the imaginary part．

## Examples

```
SADD16 r0, r1, r2 ; Signed 16-bit SIMD add
SADDSUBX r0, r1, r2 ; r0=r1+i*r2 in packed complex arithmetic
SSUBADDX r0, r1, r2 ; r0=r1-i*r2 in packed complex arithmetic
```

SBC Subtract with carry

```
1. SBC<cond>{S} Rd, Rn, 非<rotated_immed> ARMv1
2. SBC<cond>{S} Rd, Rn, Rm {, <shift>} ARMv1
3. SBC Ld, Lm THUMBv1
```

Action
$\begin{array}{ll}\text { 1. } R d=R n-\text { rotated_immed> - ( } \sim C) & \text { Updated if S suffix specified } \\ \text { 2. Rd }=R n-\langle\text { shifted_Rm>- }(\sim C) & \text { Updated if } S \text { suffix specified } \\ \text { 3. } L d=L d-L m-(\sim C) & \text { Updated (see Notes below) }\end{array}$
Notes
■ If the operation updates the $c p s r$ and $R d$ is not $p c$, then $N=\langle$ Negative $\rangle, Z=$ $<$ Zero $\rangle, C=<$ NoUnsignedOverflow $\rangle, V=<$ SignedOverflow $\rangle$. The carry flag is set this way because the subtract $x-y-\sim C$ is implemented as the add $x+\sim y+$ $C$. The carry flag is one if $x+\sim y+C$ overflows. This happens when $x-y-\sim C$ doesn't overflow.

- If $R d$ is $p c$, then the instruction effects a jump to the calculated address. If the operation updates the $c p s r$, then the processor mode must have an $s p s r$. In this case the $c p s r$ is set to the value of the spsr.

■ If $R n$ or $R m$ is $p c$, then the value used is the address of the instruction plus eight bytes.

The following example implements a 64 -bit subtract:

```
SUBS r0, r0, r2 ; subtract low words, C=NOT(borrow)
SBC r1, r1, r3 ; subtract high words and borrow
```

SEL Select between two source operands based on the GE flags

1. SEL〈cond〉Rd, Rn, Rm ARMv6

Action

```
1. Rd[31:24] = GE3 ? Rn[31:24] : Rm[31:24];
    Rd[23:16] = GE2 ? Rn[23:16] : Rm[23:16];
    Rd[15:08] = GE1 ? Rn[15:08] : Rm[15:08];
    Rd[07:00] = GE0 ? Rn[07:00] : Rm[07:00]
```

Notes
$R d, R n, R m$ must not be $p c$.

- See SADD for instructions that set the GE flags in the cpsr.

SETEND Set the endianness for data accesses

1. SETEND BE
ARMv6/THUMBv3
2. SETEND LE
ARMv6/THUMBv3

Action

1. In the cpsr E=1 so data accesses will be big-endian
2. In the cpsr $\mathrm{E}=0$ so data accesses will be little-endian

## Note

- ARMv6 uses a byte-invariant endianness model. This means that byte loads and stores are not affected by the configured endianess. For little-endian data access the byte at the lowest address appears in the least significant byte of the loaded word. For big-endian data accesses the byte at the lowest address appears in the most significant byte of the loaded word.

SHADD Parallel halving add and subtract operations

| 1. $\{\mathrm{S} \mid \mathrm{U}\}$ HADD16<cond> | Rd, Rn, Rm | ARMv 6 |
| :---: | :---: | :---: |
| 2. \{S\|U\}HADDSUBX<cond> | Rd, Rn, Rm | ARMv6 |
| 3. \{S\|U\}HSUBADDX<cond> | Rd, Rn, Rm | ARMv6 |
| 4. \{S\|U\}HSUB16〈cond> | Rd, Rn, Rm | ARMv6 |
| 5. $\{\mathrm{S} \mid$ U\}HADD8<cond> | Rd, Rn, Rm | ARMv 6 |
| 6. \{S\|U\}HSUB8<cond> | Rd, Rn, Rm | ARMv |

Action

1. $\operatorname{Rd}[31: 16]=(\operatorname{Rn}[31: 16]+\operatorname{Rm}[31: 16]) \gg 1$;
$\operatorname{Rd}[15: 00]=(\operatorname{Rn}[15: 00]+\operatorname{Rm}[15: 00]) \gg 1$
2. $\operatorname{Rd}[31: 16]=(\operatorname{Rn}[31: 16]+\operatorname{Rm}[15: 00]) \gg 1 ;$
$\operatorname{Rd}[15: 00]=(\operatorname{Rn}[15: 00]-\operatorname{Rm}[31: 16]) \gg 1$
3. $\operatorname{Rd}[31: 16]=(\operatorname{Rn}[31: 16]-\operatorname{Rm}[15: 00]) \gg 1$;
$\operatorname{Rd}[15: 00]=(\operatorname{Rn}[15: 00]+\operatorname{Rm}[31: 16]) \gg 1$
4. $\operatorname{Rd}[31: 16]=(\operatorname{Rn}[31: 16]-\operatorname{Rm}[31: 16]) \gg 1 ;$
$\operatorname{Rd}[15: 00]=(\operatorname{Rn}[15: 00]-\operatorname{Rm}[15: 00]) \gg 1$
5. $\operatorname{Rd}[31: 24]=(\operatorname{Rn}[31: 24]+\operatorname{Rm}[31: 24]) \gg 1 ;$
$\operatorname{Rd}[23: 16]=(\operatorname{Rn}[23: 16]+\operatorname{Rm}[23: 16]) \gg 1 ;$
$\operatorname{Rd}[15: 08]=(\operatorname{Rn}[15: 08]+\operatorname{Rm}[15: 08]) \gg 1 ;$
$\operatorname{Rd}[07: 00]=(\operatorname{Rn}[07: 00]+\operatorname{Rm}[07: 00]) \gg 1$
6. $\operatorname{Rd}[31: 24]=(\operatorname{Rn}[31: 24]-\operatorname{Rm}[31: 24]) \gg 1 ;$
$\operatorname{Rd}[23: 16]=(\operatorname{Rn}[23: 16]-\operatorname{Rm}[23: 16])>1 ;$
$\operatorname{Rd}[15: 08]=(\operatorname{Rn}[15: 08]-\operatorname{Rm}[15: 08]) \gg 1 ;$
$\operatorname{Rd}[07: 00]=(\operatorname{Rn}[07: 00]-\operatorname{Rm}[07: 00]) \gg 1$

Notes
$\square$ If you use the $S$ prefix, then all operations are signed and values are sign-extended before the addition.

- If you use the $U$ prefix, then all operations are unsigned and values are zero-extended before the addition.
- $R d, R n$, and $R m$ must not be $p c$.
－These operations provide parallel arithmetic that cannot overflow，which is useful for DSP processing of normalized signals．

SMLS Signed multiply accumulate instructions
SMLA

| SMLA $\langle x\rangle\langle y\rangle\langle$ cond＞ | Rd，Rm， | Rs， | Rn | ARMv5E |
| :---: | :---: | :---: | :---: | :---: |
| 2．SMLAW〈y＞＜cond＞ | Rd，Rm， | Rs， | Rn | ARMv5E |
| 3．SMLAD $\{\mathrm{X}\}<$ cond＞ | Rd，Rm， | Rs， | Rn | ARMv6 |
| 4．SMLSD $\{\mathrm{X}\}<$ cond＞ | Rd，Rm， | Rs， | Rn | ARMv 6 |
| 5．$\{\mathrm{U} \mid$ S $\}$ MLAL $\langle$ cond $\rangle\{S\}$ | RdLo，RdHi， | Rm， | Rs | ARMv3M |
| 6．SMLAL $\langle x\rangle\langle y\rangle\langle$ cond $\rangle$ | RdLo，RdHi ， | Rm， | Rs | ARMv5E |
| 7．SMLALD $\{X\}<$ cond＞ | RdLo，RdHi ， | Rm， | Rs | ARMv6 |
| 8．SMLSLD $\{\mathrm{X}\}<$ cond＞ | RdLo，RdHi ， | Rm， | Rs | ARMv6 |

Action
1．$R d=R n+(R m .\langle x\rangle * R s .\langle y\rangle)$
2．$R d=R n+(((s i g n e d) R m$＊Rs．〈y＞）＞16）
3．$R d=R n+R m . B *\langle$ rotated＿Rs＞．B＋Rm．T＊＜rotated＿Rs＞．T
4．$R d=R n+R m . B *\langle$ rotated＿Rs＞．B－Rm．T＊＜rotated＿Rs＞．T
5．RdHi：RdLo＝RdHi：RdLo＋（Rm＊Rs）
6．RdHi：RdLo＝RdHi：RdLo＋（Rm．〈x＞＊Rm．〈y＞）
7．RdHi：RdLo＝RdHi：RdLo＋Rm．B＊＜rotated＿Rs＞．B＋Rm．T＊＜rotated＿Rs＞．T
8．RdHi：RdLo＝RdHi：RdLo＋Rm．B＊＜rotated＿Rs＞．B－Rm．T＊＜rotated＿Rs＞．T

## Notes

－$\langle x\rangle$ and $<y>$ can be B or T．
■ Rm．B is shorthand for（sign－extend）$R m$［15：00］，the bottom 16 bits of $R m$ ．
■ Rm．T is shorthand for（sign－extend）Rm［31：16］，the top 16 bits of Rm ．
■＜rotated＿Rs＞is Rs if you do not specify the X suffix or Rs ROR 16 if you do specify the $X$ suffix．

■ RdHi and RdLo must be different registers．For format $5, \mathrm{Rm}$ must be a different register from RdHi and RdLo ．

■ Formats 1 to 4 update the cpsr $Q$－flag：$Q=\mathrm{Q}<$ SignedOverflow $>$ ．
■ Format 5 implements an unsigned multiply with the $U$ prefix or a signed multiply with the $S$ prefix．
－Format 5 updates the $c p s r$ if the $S$ suffix is present：$N=R d H i[31], Z=(R d H i==0$ \＆\＆$R d L o==0)$ ；the $C$ and $V$ flags are unpredictable．Avoid using $\{U \mid S\} M L A L S$ because implementations often impose penalty cycles for this operation．

■ Implementations may terminate early on the value of Rs．For this reason use small or constant values for Rs where possible．

- The $X$ suffix and multiply subtract versions are useful for packed complex numbers. The following examples assume bits [15:00] hold the real part and [31:16] the imaginary part.


## Examples

```
SMLABB r0,r1, r2, r0 ; r0 += (short)r1 * (short)r2
SMLABT r0,r1, r2, r0 ; r0 += (short)r1 * ((signed)r>>216)
SMLAWB r0,r1, r2, r0 ; r0 += (r1*(short)r2)>>16
SMLAL r0, r1, r2, r3 ; acc += r2*r3, acc is 64 bits [r1:r0]
SMLALTB r0, r1, r2, r3 ; acc += ((signed)r2>>16)*((short)r3)
SMLSD r0,r1, r2, r0 ; r0 += real(r1*r2) in complex maths
SMLADX r0,r1, r2, r0 ; r0 += imag(r1*r2) in complex maths
```

SMMUL Signed most significant word multiply instructions
SMMLA
SMMLS 1. SMMUL\{R\}<cond>Rd, Rm, Rs ARMv6
2. SMMLA\{R\}<cond>Rd, Rm, Rs, Rn ARMv6
3. SMMLS\{R\}<cond>Rd, Rm, Rs, Rn ARMv6

## Action

1. $R d=(($ signed $) R m *($ signed) $R s+$ round) $) \gg 32$
2. $R d=\left((R n \ll 32)+(\right.$ signed $) R m^{\star}($ signed $) R s+$ round $) \gg 32$
3. $R d=\left((R n \ll 32)-(\right.$ signed $) R m^{\star}($ signed $) R s+$ round $) \gg 32$

Notes

- If you specify the $R$ suffix then round $=2^{31}$; otherwise, round $=0$.

■ $R d, R m, R s$, and $R n$ must not be $p c$.

- Implementations may terminate early on the value of Rs.

■ For 32-bit DSP algorithms these operations have several advantages over using the high result register from SMLAL: They often take fewer cycles than SMLAL. They also implement rounding, multiply subtract, and don't require a temporary scratch register for the low 32 bits of result.

Example
SMMULR r0, r1, r2 ; r0=r1*r2/2 using Q31 arithmetic

| SMUL | Signed multiply instructions |  |  |  |  |
| :--- | :--- | :--- | :--- | :--- | :--- |
| SMUA |  |  |  |  |  |
| SMUS | 1. SMUL $\langle x\rangle\langle y\rangle\langle$ cond $\rangle$ | $R d$, | $R m$, | Rs | ARMv5E |
|  | 2. SMULW〈y><cond> | $R d$, | $R m$, | $R s$ | ARMv5E |
|  | 3. SMUAD $\{X\}\langle$ cond $\rangle$ | $R d$, | $R m$, | $R s$ | ARMv6 |

4．SMUSD $\{X\}<$ cond＞Rd，Rm，Rs $\quad$ ARMv6

5．$\{U \mid S\} M U L L\langle c o n d>\{S\}$ RdLo，RdHi，Rm，Rs ARMv3M
Action


Notes
－$\langle x\rangle$ and $\langle y\rangle$ can be B or T．
■ Rm．$B$ is shorthand for（sign－extend）$R m$［15：00］，the bottom 16 bits of $R m$ ．
■ Rm．$T$ is shorthand for（sign－extend）$R m$［31：16］，the top 16 bits of Rm ．
■＜rotated＿Rs＞is Rs if you do not specify the $X$ suffix or Rs ROR 16 if you do specify the $X$ suffix．

■ $R d H i$ and $R d L o$ must be different registers．For format $5, R m$ must be a different register from RdHi and RdLo．

■ Format 4 updates the $c p s r$ Q－flag：$Q=Q \mid<$ SignedOverflow $>$ ．
■ Format 5 implements an unsigned multiply with the $U$ prefix or a signed multiply with the $S$ prefix．

■ Format 5 updates the $c p s r$ if the $S$ suffix is present：$N=$ RdHi［31］，$Z=($ RdHi＝＝0 $\& \& R d L o==0$ ）；the $C$ and $V$ flags are unpredictable．Avoid using $\{$ SIU\}MULLS because implementations often impose penalty cycles for this operation．
－Implementations may terminate early on the value of Rs．For this reason use small or constant values for Rs where possible．

■ The $X$ suffix and multiply subtract versions are useful for packed complex numbers．The following examples assume bits［15：00］hold the real part and ［31：16］the imaginary part．

## Examples

```
SMULBB r0, r1, r2 ; r0 = (short)r1 * (short)r2
SMULBT r0, r1, r2 ; r0 = (short)r1 * ((signed)r2>>16)
SMULWB r0, r1, r2 ; r0 = (r1*(short)r2)>>16
SMULL r0, r1, r2, r3 ; acc = r2*r3, acc is 64 bits [r1:r0]
SMUADX r0, r1, r2 ; r0 = imag(r1*r2) in complex maths
```

This performs the operation that $S T M<a m o d e>s p_{-}<m o d e>\{!\},\{l r, s p s r\}$ would perform if STM allowed a register list of $\{l r, s p s r\}$ and allowed you to reference the stack pointer of a different mode．See the entry for STM．

SSAT Saturate to $n$ bits

```
1. {S|U}SAT<cond> Rd, 非<n>, Rm {, LSL非<0-31>}
2. {S|U}SAT<cond> Rd, 非<n>, Rm {, ASR非<1-32>}
3. {S|U}SAT16<cond> Rd, 非\langlen\rangle, Rm
```

Action
1．$R d=\operatorname{sat}(\langle$ shifted＿Rm＞，n）；$\quad \mathrm{Q}=\mathrm{Q}| 1$ if saturation occurred
2．Rd $=$ sat（＜shifted＿Rm＞，n）；$\quad 0=0 \mid 1$ if saturation occurred
3． $\operatorname{Rd}[31: 16]=\operatorname{sat}(\operatorname{Rm}[31: 16], n) ; 0=0 \mid 1$ if saturation occurred $\operatorname{Rd}[15: 00]=\operatorname{sat}(\operatorname{Rm}[15: 00], n)$

Notes
－If you specify the S prefix，then $\operatorname{sat}(x, n)$ saturates the signed value $x$ to a signed $n$－bit value in the range $-2^{n-1} \leqslant x<2^{n-1}$ ．$n$ is encoded as $1+<$ immed $5>$ for SAT and $1+$＜immed $4>$ for SAT16．

■ If you specify the $U$ prefix，then sat $(x, n)$ saturates the signed value $x$ to an unsigned $n$－bit value in the range $0 \leqslant x \leqslant 2^{n}$ ．$n$ is encoded as $\langle$ immed $5>$ for SAT and＜immed4＞for SAT16．
－Rd and $R m$ must not be $p c$ ．

SSUB Signed parallel subtract（see SADD）

STC Store to coprocessor single or multiple 32－bit values

| 1．STC $\langle$ cond＞ 4 | ＜copro＞，Cd， | d8＞＊ 4$\}]\{!\}$ | ARMv2 |
| :---: | :---: | :---: | :---: |
| 2．STC＜cond＞$\{\mathrm{L}\}$ | ＜copro＞，Cd， | ［Rn］，非\｛－\}<immed8>*4 | ARMv2 |
| 3．STC＜cond＞$\{\mathrm{L}\}$ | ＜copro＞，Cd， | ［Rn］，＜option＞ | ARMv2 |
| 4．STC2 2 L$\}$ | ＜copro＞，Cd， | ［Rn \｛，非 - \}<immed8>*4\}]\{!\} | ARMv5 |
| 5．STC2 2 L$\}$ | ＜copro＞，Cd， | ［Rn］，非 －\} <immed8>*4 | ARMv5 |
| 6．STC2 2 L$\}$ | ＜copro＞，Cd， | ［Rn］，＜option＞ | ARMv5 |

These instructions initiate a memory write，transferring data to memory from the given coprocessor．〈copro〉 is the number of the coprocessor in the range $p 0$ to $p 15$ ．The core takes an undefined instruction trap if the coprocessor is not present． The memory write consists of a sequence of words to sequentially increasing addresses．The initial address is specified by the addressing mode in Table B1．10． The coprocessor controls the number of words transferred，up to a maximum
limit of 16 words. The fields $\{L\}$ and $C d$ are interpreted by the coprocessor and ignored by the ARM. Typically Cd specifies the source coprocessor register for the transfer. The <opt i on> field is an eight-bit integer enclosed in $\}$. Its interpretation is coprocessor dependent.

If the address is not a multiple of four, then the access is unaligned. The restrictions on an unaligned access are the same as for STM.

TABLE B1.10 STC addressing modes.

| Addressing format | Address accessed | Value written back to Rn |
| :---: | :---: | :---: |
| [Rn \{, \#\{\}<immed>\}] | $\mathrm{Rn}+\{\{ \}<$ immed>\} | Rn preserved |
| [Rn , \# \# $\}<$ immed>\}]! | $\mathrm{Rn}+\{\{ \}$ <immed>\} | $\mathrm{Rn}+\{\{-\}<$ immed>\} |
| [Rn], \#\{\}<immed> | Rn | $\mathrm{Rn}+\{ \}<$ <immed> |
| [Rn], <option> | Rn | Rn preserved |

STM Store multiple 32-bit registers to memory

|  | STM<cond><a mode> | $\operatorname{Rn}\{!\},\left\langle r e g i s t e r \_1 i s t\right\rangle\left\{{ }^{\wedge}\right\}$ | ARMv 1 |
| :---: | :---: | :---: | :---: |
|  | 2. STMIA | Rn!, <register_1ist> | THUMBv1 |

These instructions store multiple words to sequential memory addresses. The $<$ register_ list> specifies a list of registers to store, enclosed in curly brackets \{\}. Although the assembler allows you to specify the registers in the list in any order, the order is not stored in the instruction, so it is good practice to write the list in increasing order of register number since this is the usual order of the memory transfer.

The following pseudocode shows the normal action of STM. We use <register_ list>[i] to denote the register appearing at position $i$ in the list starting at 0 for the first register. This assumes that the list is in order of increasing register number.

```
N = the number of registers in <register_1ist>
start = the lowest address accessed given in Table B1.11
    for (i=0; i<N; i++)
    memory(start+i*4, 4) = <register_1ist\rangle[i];
    if (! specified) then update Rn according to Table B1.11
```

Note that memory (a, 4) refers to the four bytes at address a packed according to the current processor data endianness. If a is not a multiple of four, then the store is unaligned. Because the behavior of an unaligned store depends on the architecture revision, memory system, and system coprocessor (CP15) configuration, it is best to avoid unaligned stores if possible. Assuming that the external memory system does not abort unaligned stores, then the following rules usually apply:

■ If the core has a system coprocessor and bit 1 ( $A$-bit) or bit 22 ( $U$-bit) of CP15: $\mathrm{cl}: \mathrm{c} 0: 0$ is set, then unaligned store-multiples cause an alignment fault data abort exception.

- Otherwise, the access ignores the bottom two address bits.

Table B1.11 lists the possible addressing modes specified by 〈amode〉. If you specify the !, then the base address register is updated according to Table B1.11; otherwise, it is preserved. Note that the lowest register number is always written to the lowest address.

TABLE B1.11 STM addressing modes.

| Addressing <br> mode | Lowest address <br> accessed | Highest address <br> accessed | Value written back <br> to Rn if ! specified |
| :--- | :--- | :--- | :--- |
| $\{I A \mid E A\}$ | $R n$ | $R n+N * 4-4$ | $R n+N * 4$ |
| $\{I B \mid F A\}$ | $R n+4$ | $R n+N * 4$ | $R n+N * 4$ |
| $\{D A \mid E D\}$ | $R n-N * 4+4$ | $R n$ | $R n-N * 4$ |
| $\{D B \mid F D\}$ | $R n-N * 4$ | $R n-4$ | $R n-N * 4$ |

The first half of the addressing mode mnemonics stands for Increment After, Increment Before, Decrement After, and Decrement Before, respectively. Increment modes store the registers sequentially forward starting from address Rn (increment after) or $R n+4$ (increment before). Decrement modes have the same effect as if you stored the register list backwards to sequentially descending memory addresses starting from address $R n$ (decrement after) or $R n-4$ (decrement before).

The second half of the addressing mode mnemonics stands for the stack type you can implement with that address mode: Full Descending, Empty Descending, Full Ascending, and Empty Ascending. With a full stack, $R n$ points to the last stacked value. With an empty stack, $R n$ points to the first unused stack location. ARM stacks are usually full descending. You should use full descending or empty ascending stacks by preference, since STC also supports these addressing modes.

## Notes

■ For Thumb (format 2), $R n$ and the register list registers must be in the range $r 0$ to $r 7$.
■ The number of registers $N$ in the list must be nonzero.

- Rn must not be $p c$.
- If Rn appears in the register list and ! (writeback) is specified, the behavior is as follows: If $R n$ is the lowest register number in the list, then the original value is stored; otherwise, the stored value is unpredictable.
－If $p c$ appears in the register list，then the value stored is implementation defined．
■ If ${ }^{\wedge}$ is specified，then the operation is modified．The processor must not be in user or system mode．The registers appearing in the register list refer to the user mode versions of the registers and writeback must not be specified．
－The time order of the memory accesses may depend on the implementation．Be careful when using a store multiple to access I／O locations where the access order matters．If the order matters，then check that the memory locations are marked as I／O in the page tables．Do not cross page boundaries，and do not use $p c$ in the register list．


## Examples

```
STMIA r4!, {r0, r1} ; *r4=r0, *(r4+4)=r1, r4+=8
STMDB r4!, {r0, r1} ; *(r4-4)=r1, *(r4-8)=r0, r4-=8
STMEQFD sp!, {r0, 1r} ; if (result zero) then stack r0, 1r
STMFD sp, {sp}^ ; store sp_usr on stack sp_current
```

STR Store a single value to a virtual address in memory

| 1．STR$\langle$ cond $>\{\mid B\}$ | Rd， |  | ARMv 1 |
| :---: | :---: | :---: | :---: |
| 2．STR＜cond＞$\{\mid B\}$ | Rd， | ［Rn，$\quad$－$\}$ Rm \｛，＜imm＿shift＞\}]\{!\} | ARMv 1 |
| 3．STR＜cond＞${ }^{\text {a }}$｜ B$\}\{T\}$ | Rd， | ［Rn］，非 -$\}<i m m e d 12>$ | ARMv 1 |
| 4．STR＜cond＞${ }^{\text {a }}$｜ B$\}\{T\}$ | Rd， | ［Rn］，$\{-\}$ Rm $\{$ ，＜imm＿shift＞\} | ARMv 1 |
| 5．STR〈cond＞\｛H\} | Rd， | ［Rn，\｛，非 -$\}$＜immed8＞\}] ${ }^{\text {l }}$ ！\} | ARMv 4 |
| 6．STR〈cond＞\｛H\} | Rd， | ［Rn，$\{-\}$ Rm］$\{$ ！\} | ARMv 4 |
| 7．STR＜cond＞\｛H\} | Rd， | ［Rn］，非\｛－\}〈immed8> | ARMv 4 |
| 8．STR〈cond＞\｛H\} | Rd， | ［Rn］，\｛－\}Rm | ARMv 4 |
| 9．STR＜cond＞D | Rd， | ［Rn，\｛，非\｛－\}<immed8>\}]\{!\} | ARMv5E |
| 10．STR＜cond＞D | Rd， | ［Rn，$\{-\}$ Rm］$\{$ ！\} | ARMv5E |
| 11．STR＜cond＞D | Rd， | ［Rn］，非\｛－\}〈immed8> | ARMv5E |
| 12．STR＜cond＞D | Rd， | ［Rn］，$\{-\}$ Rm | ARMv5E |
| 13．STREX〈cond＞ | Rd， | Rm，［Rn］ | ARMv 6 |
| 14．STR ｜ $\mid$ B｜H\} | Ld， | ［Ln，非〈immed5＞＊＜size＞］ | THUMBV1 |
| 15．STR ｜ $\mid$ B｜$H\}$ | Ld， | ［Ln，Lm］ | THUMBv 1 |
| 16．STR | Ld， | ［sp，非〈immed8＞＊4］ | THUMBv 1 |
| 17．STR＜cond＞＜type＞ | Rd， | ＜1abel＞ | MACRO |

Formats 1 to 16 store a single data item of the type specified by the opcode suffix， using a preindexed or postindexed addressing mode．Tables B1．12 and B1．13 show the different addressing modes and data types．

In Table B1．13，memory（ $\mathrm{a}, \mathrm{n}$ ）refers to n sequential bytes at address a ．The bytes are packed according to the configured processor data endianness．memory $\top(a, n)$ performs the access with user mode privileges，regardless of the current processor mode．The act of function IsExclusive（a）used by STREX depends on address a．If a has the shared TLB attribute，then IsExclusive（a）is true if address a is marked as exclusive for this processor．It then clears any exclusive accesses on this processor and any exclusive
accesses to address a on other processors in the system. If a does not have the shared TLB attribute, then Is Exclusive (a) is true if there is an outstanding exclusive access on this processor. It then clears any such outstanding access.

TABLE B1.12 STR addressing modes.

| Addressing format | Address a accessed | Value written back to Rn |
| :---: | :---: | :---: |
| [Rn $\{, \#\{-\}<$ immed>\}] | $\mathrm{Rn}+\{\{-\}<$ immed> $\}$ | Rn preserved |
| [Rn $\{, \#\{-\}<$ immed>\}]! | $\mathrm{Rn}+\{\{-\}<$ immed> $\}$ | $\mathrm{Rn}+\{\{-\}<$ immed> $>\}$ |
| [Rn, \{-\}Rm \{, <shift>\}] | $\mathrm{Rn}+\{-\}<$ shifted_Rm> | Rn preserved |
| [Rn, \{-\}Rm $\{,<$ shift $>\}]$ ! | $\mathrm{Rn}+\{-\}<$ shifted_Rm> | $\mathrm{Rn}+\{-\}<$ shifted_Rm> |
| [Rn], \#\{-\}<immed> | Rn | $\mathrm{Rn}+\{-\}<$ immed $>$ |
| [Rn], \{-\}Rm $\{,<$ shift>\} | Rn | $\mathrm{Rn}+\{-\}<$ shifted_Rm> |

TABLE B1.13 STR data types.

| Store | Datatype | <size> (bytes) | Action |
| :---: | :---: | :---: | :---: |
| STR | word | 4 | $\operatorname{memory}(\mathrm{a}, 4)=\mathrm{Rd}$ |
| STRB | unsigned Byte | 1 | $\operatorname{memory}(\mathrm{a}, 1)=($ char $) \mathrm{Rd}$ |
| STRBT | Byte Translated | 1 | memoryT( $\mathrm{a}, 1$ ) = (char)Rd |
| STRD | Double word | 8 | $\operatorname{memory}(\mathrm{a}, 4)=\mathrm{Rd}$ |
|  |  |  | memory $(\mathrm{a}+4,4)=\mathrm{R}(\mathrm{d}+1)$ |
| STREX | word EXclusive | 4 | if (IsExclsuive(a)) \{ |
|  |  |  | memory (a, 4) = Rm; |
|  |  |  | Rd $=0$; |
|  |  |  | \} else \{ |
|  |  |  | $\mathrm{Rd}=1 ;$ |
|  |  |  | \} |
| STRH | unsigned Halfword | 2 | memory(a, 2) = (short) Rd |
| STRT | word Translated | 4 | $\operatorname{memoryT}(\mathrm{a}, 4)=\mathrm{Rd}$ |

If the address $a$ is not a multiple of $\langle$ size $\rangle$, then the store is unaligned. Because the behavior of an unaligned store depends on the architecture revision, memory system, and system coprocessor (CP15) configuration, it is best to avoid unaligned stores if possible. Assuming that the external memory system does not abort unaligned stores, then the following rules usually apply. In the rules, $A$ is bit 1 of system coprocessor register CP15:c1:c0:0, and $U$ is bit 22 of CP15:c1:c0:0, introduced in ARMv6. If there is no system coprocessor, then $A=U=0$.

■ If $A=1$ ，then unaligned stores cause an alignment fault data abort exception except that word－aligned double－word stores are supported if $U=1$ ．

■ If $A=0$ and $U=1$ ，then unaligned stores are supported for STR\｛ITIHISH\}. Wordaligned stores are supported for STRD．A non－word－aligned STRD generates an alignment fault data abort．
■ If $A=0$ and $U=0$ ，then STR and STRT write to memory $(a \& \sim 3,4)$ ．All other unaligned operations are unpredictable but do not cause an alignment fault

Format 17 generates a $p c$－relative store accessing the address specified by＜label＞．In other words it assembles to $S T R<$ cond $><t y p e>R d$ ，［ $p c$ ，\＃＜offset＞］whenever this instruction is supported and $\langle$ offset $\rangle=<$ label $\rangle-p c$ is in range．

## Notes

■ For double－word stores（formats 9 to 12），$R d$ must be even and in the range $r 0$ to $r 12$ ．

■ If the addressing mode updates $R n$ ，then $R d$ and $R n$ must be distinct．
－If $R d$ is $p c$ ，then $\langle$ size $\rangle$ must be 4 ．The value stored is implementation defined．
■ If $R n$ is $p c$ ，then the addressing mode must not update $R n$ ．The value used for $R n$ is the address of the instruction plus eight bytes．
－Rm must not be $p c$ ．

## Examples

| STR | ro， | ［r0］ | ＊（int＊）r0 $=$ r0； |
| :---: | :---: | :---: | :---: |
| STRH | ro， | ［ r 1$]$ ，非4 | ＊（short＊）r1＝r0；r1＋＝4； |
| STRD | r2， | ［r1，非－8］！ | r1－＝8；＊（int＊）r1＝r2；＊（int＊）（r1＋4）＝r3 |
| STRB | r0， | ［r2，非55］ | ；＊（char＊）（r2＋55）＝r0； |
| STRB | ro， | ［r1］，－r2， | LSL 非 ${ }^{\text {；＊}}$（char＊）r1＝r0；r1－＝256＊r2； |

SUB Subtract two 32－bit values
1．SUB〈cond＞\｛S\} Rd, Rn, 非〈rotated_immed> ARMv1
2．$S U B\langle c o n d\rangle\{S\}$ Rd，Rn，Rm $\{$ ，〈shift〉\} ARMv1
3．SUB Ld，Ln，非〈immed3＞THUMBv1
4．SUB Ld，非〈immed8〉 THUMBv1
5．SUB Ld，Ln，Lm THUMBv1
6．SUB $\operatorname{sp}$ ，非〈immed7＞＊4 THUMBv1

Action
1． $\mathrm{Rd}=\mathrm{Rn}$－$\langle$ rotated immed＞
2．Rd＝Rn－〈shifted＿Rm＞Updated if S suffix specified
3．Ld $=$ Ln－〈immed3＞Updated（see Notes below）
4．Ld $=$ Ld－〈immed8＞Updated（see Notes below）
Effect on the cpsr
1．Rd $=$ Rn $-\langle$ rotated＿immed〉
2．Rd $=R n-\langle$ shifted＿Rm＞
3．Ld $=L n-\langle i m m e d 3\rangle$
4．Ld $=L d-\langle i m m e d 8\rangle$
Updated if S suffix specified
Updated if S suffix specified
Updated（see Notes below）
Updated（see Notes below）

```
5. Ld = Ln - Lm
6. sp = sp - \langleimmed7\rangle*4
Updated (see Notes below)
Preserved
```

Notes
■ If the operation updates the cpsr and $R d$ is not $p c$ ，then $N=\langle$ Negative $\rangle, Z$ $=\langle$ Zero $\rangle, C=\langle$ NoUnsignedOverflow $\rangle$ ，and $V=\langle$ SignedOverflow $\rangle$ ．The carry flag is set this way because the subtract $x-y$ is implemented as the add $x+\sim y+1$ ．The carry flag is one if $x+\sim y+1$ overflows．This happens when $x \geq y$ ，when $x-y$ doesn＇t overflow．

■ If $R d$ is $p c$ ，then the instruction effects a jump to the calculated address．If the operation updates the $c p s r$ ，then the processor mode must have an $s p s r$ ；in this case，the $c p s r$ is set to the value of the $s p s r$ ．
－If $R n$ or $R m$ is $p c$ ，then the value used is the address of the instruction plus eight bytes．

Examples

```
SUBS r0,r0,非1 ; r0-=1, setting flags
SUB r0,r1,r1, LSL 非2 ; r0 = -3*r1
SUBS pc,1r,非4 ; jump to 1r-4, set cpsr=spsr
```

SWI Software interrupt

```
1. SWI<cond><immed24> ARMv1
2. SWI <immed8> THUMBv1
```

The SWI instruction causes the ARM to enter supervisor mode and start executing from the SWI vector．The return address and $c p s r$ are saved in $l r_{-} s v c$ and $s p s r_{-} s v c$ ， respectively．The processor switches to ARM state and IRQ interrupts are disabled．The SWI vector is at address $0 \times 00000008$ ，unless high vectors are configured；then it is at address 0xFFFF0008．

The immediate operand is ignored by the ARM．It is normally used by the SWI exception handler as an argument determining which function to perform．

## Example

SWI 0x123456 ；Used by the ARM tools to implement Semi－Hosting
SWP Swap a word in memory with a register，without interruption
1．SWP＜cond＞Rd，Rm，［Rn］ARMv2a
2．SWP＜cond＞B Rd，Rm，［Rn］ARMv2a
Action
1．temp＝memory（Rn，4）；memory（Rn，4）＝Rm；Rd＝temp；
2．temp＝（zero extend）memory（Rn，1）；memory（Rn，1）＝（char）Rm；Rd＝temp；

## Notes

－The operations are atomic．They cannot be interrupted partway through．
■ $R d, R m, R n$ must not be $p c$ ．
■ $R n$ and $R m$ must be different registers．$R n$ and $R d$ must be different registers．
■ $R n$ should be aligned to the size of the memory transfer．
■ If a data abort occurs on the load，then the store does not occur．If a data abort occurs on the store，then $R d$ is not written．

You can use the SWP instruction to implement 8－bit or 32－bit semaphores on ARMv5 and below．For ARMv6 use LDREX and STREX in preference．As an example，suppose a byte semaphore register pointed to by $r 1$ can have the value $0 \times F F$（claimed）or $0 \times 00$ （free）．The following example claims the lock．If the lock is already claimed，then the code loops，waiting for an interrupt or task switch that will free the lock．

| MOV | r0，非0xFF | ；value to claim the lock |
| :--- | :--- | :--- |
| loops |  |  |
| CMPB | r0，r0，［r1］ | r0，try and claim the lock |
| BEQ | loop | ；check to see if it was already claimed |

SXT Byte or halfword extract or extract with accumulate SXTA

| 1．$\{\mathrm{S} \mid \mathrm{U}\} \times$ TB16＜cond＞ | Rd，Rm | \｛，ROR\＃\＃8＊＜rot＞\} | ARMv6 |
| :---: | :---: | :---: | :---: |
| 2．$\{\mathrm{S} \mid \mathrm{U}\} \times \mathrm{TB}\langle\mathrm{cond}\rangle$ | Rd，Rm | \｛，ROR⿰⿰三丨⿰丨三一⿻上丨又土＜＜rot＞\} | ARMv6 |
| 3．$\{\mathrm{S} \mid \mathrm{U}\} \times \mathrm{TH}\langle$ cond＞ | Rd，Rm | \｛，ROR\＃\＃8＊＜rot＞\} | ARMv6 |
| 4．$\{\mathrm{S} \mid \mathrm{U}\} \times$ TAB16〈cond＞ | Rd，Rn， | Rm \｛，ROR非8＊＜rot＞\} | ARMv6 |
| 5．$\{\mathrm{S} \mid \mathrm{U}\} \times$ TAB＜cond＞ | Rd，Rn， | Rm \｛，ROR非8＊＜rot＞\} | ARMv 6 |
| 6．$\{\mathrm{S} \mid \mathrm{U}\} \times$ TAH〈cond＞ | Rd，Rn， | Rm \｛，ROR非8＊＜rot＞\} | ARMv6 |
| 7．$\{\mathrm{S} \mid \mathrm{U}\} \times$ TB | Ld，Lm | THUMBv3 |  |
| 8．$\{S \mid U\} X$ TH | Ld，Lm | THUMBv3 |  |

Action
1． $\operatorname{Rd}[31: 16]=$ extend（＜shifted＿Rm＞［23：16］）； $\operatorname{Rd}[15: 00]=$ extend（＜shifted＿Rm＞［07：00］）
2．Rd＝extend（＜shifted＿Rm＞［07：00］）
3．Rd＝extend（＜shifted＿Rm＞［15：00］）
4． $\operatorname{Rd}[31: 16]=\operatorname{Rn}[31: 16]+$ extend（＜shifted＿Rm＞［23：16］）；
$\operatorname{Rd}[15: 00]=\operatorname{Rn}[15: 00]+$ extend（＜shifted＿Rm＞［07：00］）
5．$R d=R n+$ extend（＜shifted＿Rm＞［07：00］）
6．$R d=R n+$ extend（＜shifted＿Rm＞［15：00］）
7．Ld＝extend（Lm［07：00］）
8．Ld $=\operatorname{extend}(L m[15: 00])$

## Notes

■ If you specify the $S$ prefix，then extend $(x)$ sign extends $x$ ．
■ If you specify the $U$ prefix，then extend $(x)$ zero extends $x$ ．
－$R d$ and $R m$ must not be $p c$ ．
■＜rot＞is an immediate in the range 0 to 3 ．

TEQ Test for equality of two 32－bit values

```
1. TEQ<cond> Rn, 非<rotated_immed> ARMv1
2. TEQ<cond> Rn, Rm {, <shift>} ARMv1
```

Action
1．Set the cpsr on the result of（ $\mathrm{Rn}^{\wedge}$ 〈rotated＿immed＞）
2．Set the cpsr on the result of（Rn ${ }^{\wedge}$ 〈shifted＿Rm＞）
Notes
－The cpsr is updated：$N=\langle$ Negative $\rangle, Z=\langle$ Zero $\rangle, C=\langle$ shifter＿C $\rangle$（see Table B1．3）．
－If $R n$ or $R m$ is $p c$ ，then the value used is the address of the instruction plus eight bytes．
－Use this instruction instead of CMP when you want to check for equality and preserve the carry flag．

Example

```
TEQ r0, 非 ; test to see if r0==1
```

TST Test bits of a 32－bit value

```
1. TST<cond> Rn, 非<rotated_immed> ARMv1
2. TST<cond> Rn, Rm {, <shift>} ARMv1
3. TST Ln, Lm THUMBv1
```

Action

```
1. Set the cpsr on the result of (Rn & <rotated_immed>)
2. Set the cpsr on the result of (Rn & <shifted_Rm>)
3. Set the cpsr on the result of (Ln & Lm)
```

Notes
■ The $c p s r$ is updated：$N=\langle$ Negative $\rangle, Z=\langle$ Zero $\rangle, C=\langle$ shifter＿C $\rangle$（see Table B1．3）．
■ If $R n$ or $R m$ is $p c$ ，then the value used is the address of the instruction plus eight bytes．

Use this instruction to test whether a selected set of bits are all zero．

## Example

```
TST r0, 非0xFF ; test if the bottom 8 bits of r0 are 0
```

UADD Unsigned parallel modulo add（see the entry for SADD）

UHADD Unsigned halving add and subtract（see the entry for SHADD）
UHSUB

UMAAL Unsigned multiply accumulate accumulate long
1．UMAAL＜cond＞RdLo，RdHi，Rm，Rs ARMv6
Action
1．RdHi：RdLo $=($ unsigned）Rm＊Rs + （unsigned）RdLo + （unsigned）RdHi
Notes
－RdHi and RdLo must be different registers．
－RdHi，RdLo，Rm，Rs must not be $p c$ ．
■ This operation cannot overflow because $\left(2^{32}-1\right)\left(2^{32}-1\right)+\left(2^{32}-1\right)+$ $\left(2^{32}-1\right)=\left(2^{64}-1\right)$ ．You can use it to synthesize the multiword multiplications used by public key cryptosystems．

UMLAL Unsigned long multiply and multiply accumulate（see the SMLAL and UMULL SMULL entries）

UQADD Unsigned saturated add and subtract（see the QADD entry） UQSUB

USAD Unsigned sum of absolute differences

| 1．USAD8〈cond＞ | Rd，Rm，Rs | ARMv6 |
| :--- | :--- | :--- |
| 2．USADA8〈cond＞ | Rd，Rm，Rs，Rn | ARMv6 |

Action
1． $\operatorname{Rd}=\operatorname{abs}(\operatorname{Rm}[31: 24]-\operatorname{Rs}[31: 24])+\operatorname{abs}(\operatorname{Rm}[23: 16]-\operatorname{Rs}[23: 16])$
$+\operatorname{abs}(\operatorname{Rm}[15: 08]-\operatorname{Rs}[15: 08])+\operatorname{abs}(\operatorname{Rm}[07: 00]-\operatorname{Rs}[07: 00])$
2． $\operatorname{Rd}=\operatorname{Rn}+\operatorname{abs}(\operatorname{Rm}[31: 24]-\operatorname{Rs}[31: 24])+\operatorname{abs}(\operatorname{Rm}[23: 16]-\operatorname{Rs}[23: 16])$
$+\operatorname{abs}(\operatorname{Rm}[15: 08]-\operatorname{Rs}[15: 08])+\operatorname{abs}(\operatorname{Rm}[07: 00]-\operatorname{Rs}[07: 00])$

Note
■ $a b s(x)$ returns the absolute value of $x . R m$ and $R s$ are treated as unsigned.

- Rd, Rm, and Rs must not be pc.
- The sum of absolute differences operation is common in video codecs where it provides a metric to measure how similar two images are.

USAT Unsigned saturation instruction (see the SSAT entry)
USUB Unsigned parallel modulo subtracts (see the SADD entry)
UXT Unsigned extract, extract with accumulate (see the entry for SXT)

### 31.4 ARM Assembler Quick Reference

This section summarizes the more useful commands and expressions available with the ARM assembler, armasm. Each assembly line has one of the following formats:

```
{<label>} {<instruction>} ; comment
{\langlesymbol\rangle}<directive\rangle ; comment
{\langlearg_0\rangle} <macro\rangle {\langlearg_1\rangle} {,\langlearg_2\rangle} .. {,\langlearg_n>} ; comment
```

where

- <instruction> is any ARM or Thumb instruction supported by the processor you are assembling for. See Section B1.3.

■ <label> is the name of a symbol to store the address of the instruction.
■ <directive> is an ARM assembler directive. See Section ARM Assembler Directives.

- <symbol> is the name of a symbol used by the $<$ directive $>$.

■ <macro> is the name of a new directive defined using the MACRO directive.

■ <arg_k> is the $k$ th macro argument.
You must use an AREA directive to define an area before any ARM or Thumb instructions appear. All assembly files must finish with the END directive. The following example shows a simple assembly file defining a function add that returns the sum of the two input arguments:

```
    AREA maths_routines, CODE, READONLY
    EXPORT add ; give the symbol add external linkage
add ADD r0, r0, r1 ; add input arguments
    MOV pc, 1r ; return from sub-routine
```

    END
    
## ARM Assembler Variables

The ARM assembler supports three types of assemble time variables (see Table B1.14). Variable names are case sensitive and must be declared before use with the directives GBLx or LCLx.

## TABLE B1.14 ARM assembler variable types.

| Variable type | Declare <br> globally | Declare locally <br> to a macro | Set value | Example <br> values |
| :--- | :---: | :---: | :---: | :--- |
| Unsigned 32-bit <br> integer | GBLA | LCLA | SETA | 15 , 0xab |
| ASCII string | GBLS | LCLS | SETS | "", "ADD" |
| Logical | GBLL | LCLL | SETL | \{TRUE\}, \{FALSE\} |

You can use variables in expressions (see Section ARM Assembler Labels), or substitute their value at assembly time using the $\$$ operator. Specifically, $\$$ name. expands to the value of the variable name before the line is assembled. You can omit the final period if name is not followed by an alphanumeric or underscore. Use \$\$ to produce a single \$. Arithmetic variables expand to an eight-digit hexadecimal string on substitution. Logical variables expand to $T$ or $F$.

The following example code shows how to declare and substitute variables of each type:

```
    ; arithmetic variables
        GBLA count ; declare an integer variable count
count SETA 1 ; set count = 1
            WHILE count<15
        BL test$count ; cal1 test00000001, test00000002 ...
count SETA count+1 ; .... test00000000E
    WEND
    ; string variables
    GBLS cc ; declare a string variable called cc
CC SETS "NE" ; set cc="NE"
    ADD$cc r0, r0, r0 ; assembles as ADDNE r0,r0,r0
    STR$cc.B r0, [r1] ; assembles as STRNEB r0,[r1]
```

```
    ; logical variable
    GBLL debug ; declare a logical variable called debug
debug SETL {TRUE} ; set debug={TRUE}
IF debug ; if debug is TRUE then
    BL print_debug ; print out some debug information
ENDIF
```


## ARM Assembler Labels

A label definition must begin on the first character of a line. The assembler treats indented text as an instruction, directive, or macro. It treats labels of the form $\langle N\rangle\langle n a m e\rangle$ as a local label, where $\langle N\rangle$ is an integer in the range 0 to 99 and <name> is an optional textual name. Local labels are limited in scope by the ROUT directive. To reference a local label, you refer to it as $\%\{|F| B\}\{|A| T\}\langle N\rangle\{\langle n a m e\rangle\}$. The extra prefix letters tell the assembler how to search for the label:

- If you specify $F$, the assembler searches forward; if $B$, then the assembler searches backwards. Otherwise the assembler searches backwards and then forwards.

■ If you specify $T$, the assembler searches the current macro only; if $A$, then the assembler searches all macro levels. Otherwise the assembler searches the current and higher macro nesting levels.

## ARM Assembler Expressions

The ARM assembler can evaluate a number of numeric, string, and logical expressions at assembly time. Table B1.15 shows some of the unary and binary operators you can use within expressions. Brackets can be used to change the order of evaluation in the usual way.

TABLE B1.15: ARM assembler unary and binary operators.

| Expression | Result | Example |
| :---: | :---: | :---: |
| $A+B, \quad A-B$ | A plus or minus B | 1-2 $=0 x f f f f f f f f$ |
| A*B, A/B | A multiplied by or divided by B | $2 * 3=6,7 / 3=2$ |
| A:MOD:B | A modulo B | 7:M0D:3 = 1 |
| : CHR:A | string with ASC I I code A | :CHR:32 = " " |
| ' X ' | the ASCII value of $X$ | ' a ' $=0 \times 61$ |
| :STR:A, :STR:L | A or L converted to a string | $\begin{aligned} & \text { :STR:32 = "00000020" : } \\ & \text { STR:\{TRUE\} = "T" } \end{aligned}$ |
| $A \ll B, A: S H L: B$ | A shifted left by B bits | $1 \ll 3=8$ |
| $A \gg B, A: S H R: B$ | A shifted right by B bits (logical shift) | $0 \times 80000000 \gg 4=0 \times 08000000$ |
| A:R0R:B, A:R0L:B | A rotated right/left by B bits | $\begin{aligned} & \text { 1:ROR:1 = 0x80000000 } \\ & \text { 0x80000000:ROL:1 = } 1 \end{aligned}$ |


| $\begin{aligned} & A=B, \quad A>B, \quad A>=B, \quad A\langle B, \\ & A\langle=B, \quad A /=B, \quad A\langle>B \end{aligned}$ | comparison of arithmetic or string variables (/= and <> both mean not equal) | $\begin{aligned} & (1=2)=\{\text { FALSE }\},(1<2)= \\ & \{T R U E\},(" a "=" c ")=\{F A L S E\}, \\ & (" a "<" c ")=\{T R U E\} \end{aligned}$ |
| :---: | :---: | :---: |
| A: AND: B, A: OR: B, A: EOR: <br> B, :NOT:A | Bitwise AND, OR, exclusive OR of $A$ and $B$; bitwise NOT of $A$. | $\begin{aligned} & \text { 1:AND:3 = } 1 \text { 1:OR:3 = 3:NOT:0 } \\ & =0 x F F F F F F F F \end{aligned}$ |
| :LEN:S | length of the string $S$ | :LEN:"ABC" = 3 |
| S:LEFT:B, S:RIGHT:B | leftmost or rightmost B characters of S | $\begin{aligned} & \text { "ABC":LEFT:2 = "AB", "ABC": } \\ & \text { RIGHT: } 2=\text { "BC" } \end{aligned}$ |
| S:CC:T | the concatenation of $S, T$ | "AB":CC:"C" = "ABC" |
| L:LAND:M, L:LOR:M, L:LEOR:M | logical AND, OR, exclusive OR of $L$ and $M$ | \{TRUE\}:LAND:\{FALSE\}= <br> \{FALSE\} |
| :DEF:X | returns TRUE if a variable called $X$ is defined |  |
| :BASE:A :INDEX:A | see the MAP directive |  |

TABLE B1.16 Predefined expressions.

| Variable | Value |
| :--- | :--- |
| \{ARCHITECURE \} | The ARM architecture of the CPU ("4T" for ARMv4T) |
| \{ARMASM_VERSION \} | The assembler version number |
| \{CONFIG \} or <br> \{CODESIZE \} | The bit width of the instructions being assembled (32 for ARM state, 16 for <br> Thumb state) |
| \{CPU \} | The name of the CPU being assembled for |
| \{ENDIAN \} | The configured endianness, "big" or "little" |
| $\{$ INTER \} | \{TRUE\} if ARM/Thumb interworking is on |
| \{PC \} (alias .) | The address of the current instruction being assembled |
| \{ROPI \}, \{RWPI \} | \{TRUE\} if read-only/read-write position independent |
| $\{V A R\} \quad(a l i a s @)$ | The MAP counter (see the MAP directive) |

In Table B1.15, A and B represent arbitrary integers; S and T, strings; and L and $M$, logical values. You can use labels and other symbols in place of integers in many expressions.

## Predefined Variables

Table B1.16 shows a number of special variables that can appear in expressions. These are predefined by the assembler, and you cannot override them.

## ARM Assembler Directives

Here is an alphabetical list of the more common armasm directives.

## ALIGN

```
ALIGN {<expression>, {<offset>}}
```

Aligns the address of the next instruction to the form $\mathrm{q}^{*}<$ expression $>+<$ offset $>$. The alignment is relative to the start of the ELF section so this must be aligned appropriately (see the AREA directive). <express ion> must be a power of two; the default is 4 . <offset> is zero if not specified.

## AREA

```
AREA \langlesection> {,\langleattr_1>} {,\langleattr_2>} ... {,\langleattr_k>}
```

Starts a new code or data section of name <sect i on>. Table B1.17 lists the possible attributes.

TABLE B1.17 AREA attributes.

| Attribute | Meaning |
| :--- | :--- |
| ALIGN=<expression> | Align the ELF section to a 2expression byte boundary. |
| ASSOC=<sectionname〉 | If this section is linked, also link <sectionname>. |
| CODE | The section contains instructions and is read only. |
| DATA | The section contains data and is read write. |
| NOINIT | The data section does not require initialization. |
| READONLY | The section is read only. |
| READWRITE | The section is read write. |

## ASSERT

```
ASSERT <logical-expression>
```

Assemble time assert. If the logical expression is false, then assembly terminates with an error.

```
CN
<name> CN <numeric-expression>
Set <name> to be an alias for coprocessor register <numeric-expression>.
```


## CODE16, CODE32

CODE16 tells the assembler to assemble the following instructions as 16-bit Thumb instructions. CODE32 indicates 32-bit ARM instructions (the default for armasm).

```
CP
<name> CP <numeric-expression>
```

Set <name> to be an alias for coprocessor number <numeric-expression>.

## DATA

```
<label> DATA
```

The DATA directive indicates that the label points to data rather than code. In Thumb mode this prevents the linker from setting the bottom bit of the label. Bit 0 of a function pointer or code label is 0 for ARM code and 1 for Thumb code (see the $B X$ instruction).

## DCB, DCD\{U\}, DCI, DCQ\{U\}, DCW\{U\}

These directives allocate one or more bytes of initialized memory according to Table B1.18. Follow each directive with a comma-separated list of initialization values. If you specify the optional $U$ suffix, then the assembler does not insert any alignment padding.

Examples

```
hel1o DCB "hel1o", 0
powers DCD 1, 2, 4, 8, 10, 0x20, 0x40, 0x80
    DCI 0xEA000000
```

TABLE B1.18 Memory initialization directives.

| Directive | Alias | Data size (bytes) | Initialization value |
| :---: | :---: | :---: | :--- |
| DCB | $=$ | 1 | byte or string |
| DCW |  | 2 | 16-bit integer (aligned to 2 bytes) |
| DCD | $\&$ | 4 | 32-bit integer (aligned to 4 bytes) |
| DCQ |  | 8 | 64-bit integer (aligned to 4 bytes) |
| DCI |  | 2 or 4 | integer defining an ARM or Thumb <br> instruction |

## ELSE (alias |)

See IF.

## END

This directive must appear at the end of a source file. Assembler source after an END directive is ignored.

## ENDFUNC (alias ENDP), ENDIF (alias ])

See FUNCTION and IF, respectively.

## ENTRY

This directive specifies the program entry point for the linker. The entry point is usually contained in the ARM C library.

## EQU (alias *)

## <name> EQU <numeric-expression>

This directive is similar to \#define in C. It defines a symbol <name> with value defined by the expression. This value cannot be redefined. See Section ARM Assembler Variables for the use of redefinable variables.

## EXPORT (alias GLOBAL)

EXPORT <symbol>\{[WEAK]\}
Assembler symbols are local to the object file unless exported using this command. You can link exported symbols with other object and library files. The optional [WEAK] suffix indicates that the linker should try and resolve references with other instances of this symbol before using this instance.

## EXTERN, IMPORT

```
EXTERN <symbol>{[WEAK]}
IMPORT <symbol>{[WEAK]}
```

Both of these directives declare the name of an external symbol, defined in another object file or library. If you use this symbol, then the linker will resolve it at link time. For IMPORT, the symbol will be resolved even if you don't use it. For EXTERN, only used symbols are resolved. If you declare the symbol as [WEAK], then no error is generated if the linker cannot resolve the symbol; instead the symbol takes the value 0 .

## FIELD (alias \#)

See MAP.

## FUNCTION (alias PROC) and ENDFUNC (alias ENDP)

The FUNCTION and ENDFUNC directives mark the start and end of an ATPCScompliant function. Their main use is to improve the debug view and allow backtracking of function calls during debugging. They also allow the profiler to
more accurately profile assembly functions．You must precede the function directive with the ATPCS function name．For example：

```
sub FUNCTION
    SUB r0,r0,r1
    MOV pc, 1r
    ENDFUNC
```


## GBLA，GBLL，GBLS

Directives defining global arithmetic，logic，and string variables，respectively．See Section ARM Assembler Variables．

## GET

See InCLUDE．

## GLOBAL

See EXPORT．

## IF（alias［），ELSE（alias｜），ENDIF（alias ］）

These directives provide for conditional assembly．They are similar to 非if，非 1 se，非endif，available in C．The IF directive is followed by a logical expression．The ELSE directive may be omitted．For example：

```
IF ARCHITECTURE="5TE"
    SMULBB r0, r1, r1
ELSE
    MUL r0, rl, rl
ENDIF
```


## IMPORT

See EXTERN．

## INCBIN

INCBIN＜filename＞
This directive includes the raw data contained in the binary file＜fil ename＞at the current point in the assembly．For example，INCBIN table．dat．

## INCLUDE（alias GET）

## INCLUDE〈filename〉

Use this directive to include another assembly file．It is similar to the \＃include command in C．For example，INCLUDE header．h．

## INFO（alias ！）

INFO＜numeric＿expression＞，＜string＿expression＞
If＜numeric＿expresssion＞is nonzero，then assembly terminates with error＜string＿ expresssion $>$ ．Otherwise the assembler prints＜string＿expression＞as an information message．

## KEEP

KEEP $\quad\{\langle$ symbol $\rangle\}$
By default the assembler does not include local symbols in the object file，only exported symbols（see EXPORT）．Use KEEP to include all local symbols or a specified local symbol．This aids the debug view．

## LCLA，LCLL，LCLS

These directives declare macro－local arithmetic，logical，and string variables， respectively．See Section ARM Assembler Variables．

## LTORG

Use LTORG to insert a literal pool．The assembler uses literal pools to store the constants appearing in the $L D R R d,=<$ value $>$ instruction．See LDR format 19. Usually the assembler inserts literal pools automatically，at the end of each area． However，if an area is too large，then the LDR instruction cannot reach this literal pool using $p c$－relative addressing．Then you need to insert a literal pool manually， near the LDR instruction．

## MACRO，MEXIT，MEND

Use these directives to declare a new assembler macro or pseudoinstruction．The syntax is

MACRO
\｛\＄〈arg＿0〉\} 〈macro_name〉 \{\$〈arg_1>\} \{,\$〈arg_2>\} ... \{,\$〈arg_k>\}
＜macro＿code＞
MEND
The macro parameters are stored in the dummy variables $\$<a r g \_i>$ ．This argument is set to the empty string if you don＇t supply a parameter when calling the macro．The MEXIT directive terminates the macro early and is usually used inside IF statements．For example，the following macro defines a new pseudoinstruction SMUL，which evaluates to a SMULBB on an ARMv5TE processor， and an MUL otherwise．

```
    MACRO
$label SMUL $a, $b, $c
    IF {ARCHITECTURE}="5TE"
$7abel SMULBB $a, $b, $c
    MEXIT
    ENDIF
$label MUL $a, $b, $c
MEND
```


## MAP (alias ${ }^{\wedge}$ ), FIELD (alias \#)

These directives define objects similar to C structures. MAP sets the base address or offset of a structure, and FIELD defines structure elements. The syntax is

```
MAP <base> {, <base_register>}
<name> FIELD <field_size_in_bytes>
```

The MAP directive sets the value of the special assembler variable \{VAR\} to the base address of the structure. This is either the value <base〉 or the register relative value <base_register〉+<base>. Each FIELD directive sets <name> to the value VAR and increments VAR by the specified number of bytes. For register relative values, the expressions :INDEX:<name> and :BASE:<name> return the element offset from base register, and base register number, respectively.

In practice the base register form is not that useful. Instead you can use the plain form and mention the base register explicitly in the instruction. This allows you to point to a structure of the same type with different base registers. The following example sets up a structure on the stack of two int variables:


## NOFP

This directive bans the use of floating-point instructions in the assembly file. We don't cover floating-point instructions and directives in this appendix.

## OPT

The OPT directive controls the formatting of the armasm -list option. This is seldom used now that source-level debugging is available. See the armasm documentation.

## PROC

See Function.

## RLIST, RN

```
<name> RN <numeric expression>
<name> RLIST <list of ARM register enclosed in {}>
```

These directives name a list of ARM registers or a single ARM register. For example, the following code names $r 0$ as arg and the ATPCS preserved registers as saved.

```
arg RN 0
saved RLIST {r4-r11}
```


## ROUT

The ROUT directive defines a new local label area. See Section ARM Assembler Labels.

## SETA, SETL, SETS

These directives set the values of arithmetic, logical, and string variables, respectively. See Section ARM Assembler Variables.

## SPACE (alias \%)

\{<label>\} SPACE <numeric_expression>
This directive reserves <numeric_expression> bytes of space. The bytes are zero initialized.

## WHILE, WEND

These directives supply an assemble-time looping structure. WH I LE is followed by a logical expression. While this expression is true, the assembler repeats the code between WHILE and WEND. The following example shows how to create an array of powers of two from 1 to 65,536.

```
GBLA count
count SETA 1
    WHILE count<=65536
    DCD count
count SETA 2*count
    WEND
```


## 81.5

GNU Assembler Quick Reference

This section summarizes the more useful commands and expressions available with the GNU assembler, gas, when you target this assembler for ARM. Each assembly line has the format

```
{<label>:} {<instruction or directive>} @ comment
```

Unlike the ARM assembler, you needn't indent instructions and directives. Labels are recognized by the following colon rather than their position at the start of the line. The following example shows a simple assembly file defining a function add that returns the sum of the two input arguments:

```
.section .text, "x"
.global add @ give the symbol add external linkage
add :
    ADD ro,r0,r1 @ add input arguments
    MOV pc, 1r @ return from subroutine
```


## GNU Assembler Directives

Here is an alphabetical list of the more common gas directives.

```
.ascii "<string>"
```

Inserts the string as data into the assembly, as for $D C B$ in armasm.

```
.asciz "<string>"
```

As for .ascii but follows the string with a zero byte.

```
.balign <power_of_2> {,<fill_value> {,<max_padding>} }
```

Aligns the address to＜power＿of＿ $2>$ bytes．The assembler aligns by adding bytes of value＜fill＿value＞or a suitable default．The alignment will not occur if more than ＜max＿padding＞fill bytes are required．Similar to ALIGN in armasm．
．byte＜byte1＞\｛，＜byte2＞\} ...
Inserts a list of byte values as data into the assembly，as for DCB in armasm．
．code＜number＿of＿bits＞
Sets the instruction width in bits．Use 16 for Thumb and 32 for ARM assembly． Similar to CODE16 and CODE32 in armasm．
．else
Use with ．if and ．endif．Similar to ELSE in armasm．
．end
Marks the end of the assembly file．This is usually omitted．
．endif
Ends a conditional compilation code block．See ．if，．ifdef，．ifndef．Similar to ENDIF in armasm．
．endm
Ends a macro definition．See ．macro．Similar to MEND in armasm．
．endr
Ends a repeat loop．See ．rept and ．irp．Similar to WEND in armasm．
．equ 〈symbol name〉，〈value〉
This directive sets the value of a symbol．It is similar to EQU in armasm．
．err
Causes assembly to halt with an error．
．exitm
Exit a macro partway through．See ．macro．Similar to MEXIT in armasm．

```
.g1obal <symbol>
```

This directive gives the symbol external linkage．It is similar to EXPORT in armasm．

```
.hword <short1> {,<short2>} ...
```

Inserts a list of 16 －bit values as data into the assembly，as for DCW in armasm．

```
.if <logical_expression>
```

Makes a block of code conditional．End the block using ．endif．Similar to IF in armasm．See also ．else．

```
.ifdef <symbol>
```

Include a block of code if $\langle$ symbol〉 is defined．End the block with ．endif．

```
.ifndef <symbol>
```

Include a block of code if $\langle s y m b o l\rangle$ is not defined．End the block with ．endif．

```
.include "<filename>"
```

Includes the indicated source file．Similar to INCLUDE in armasm or \＃include in C．

```
.irp <param> {,<val_1>} {,<val_2>} ...
```

Repeats a block of code，once for each value in the value list．Mark the end of the block using a ．endr directive．In the repeated code block，use $\backslash<p a r a m\rangle$ to substitute the associated value in the value list．

```
.macro <name> {<arg_1>} {,<arg_1>} ... {,<arg_k>}
```

Defines an assembler macro called＜name＞with $k$ parameters．The macro definition must end with ．endm．To escape from the macro at an earlier point，use ．exitm．These directives are similar to MACRO，MEND，and MEXIT in armasm．You must precede the dummy macro parameters by $\backslash$ ．For example：

```
.macro SHIFTLEFT a, b
    .if \b < 0
        MOV \a, \a, ASR 非-\b
        .exitm
        .endif
        MOV \a, \a, LSL 非\
.endm
.rept <number_of_times>
```

Repeats a block of code the given number of times．End the block with ．endr．

```
<register_name> .req <register_name>
```

This directive names a register. It is similar to the RN directive in armasm except that you must supply a name rather than a number on the right. For example, acc .req r0.

```
.section <section_name> {,"<flags>"}
```

Starts a new code or data section. Usually you should call a code section . text, an initialized data section . data, and an uninitialized data section .bss. These have default flags, and the linker understands these default names. The directive is similar to the armasm directive AREA. Table B1.19 lists possible characters to appear in the <flags> string for ELF format files.

```
.set <variable_name>, <variable_value>
```

TABLE B1.19 .section flags for ELF format files.

| Flas | Meaning |
| :--- | :--- |
| a | allocatable section |
| w | writable section |
| $x$ | executable section |

This directive sets the value of a variable. It is similar to SETA in armasm.

```
.space <number_of_bytes> {,<fi11_byte>}
```

Reserves the given number of bytes. The bytes are filled with zero or $\left\langle\mathrm{fi} 11 \_\right.$byte〉 if specified. It is similar to SPACE in armasm.

```
.word \langleword1> {,\langleword2>} ...
```

Inserts a list of 32-bit word values as data into the assembly, as for DCD in armasm.

